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PREFACE 


Computational methods of solving extremal problems developed 
very intensively in recent years. 

The lists of the literature on these subjects contain at present 
hundreds of items. This interest in the development of computational 
methods is not casual. It reflects the important role played by the 
finding of extrema in diverse applied problems. The problem of an 
effective minimization of a function with different constraints on 
the variables is the subject. matter of this book. 

It. should be stressed from the very beginning that recent. years 
have brought changes in the requirements to be met by new com- 
putational algorithms. Some ten or fifteen years ago any new algo- 
rithm for solving a minimization ° wroblem was noticed with interest , 
but now only the construction of a new algorithm is insufficient. 
It. is now necessary to. show in what respect'it is better than the exist- 
ing ones. Thus there arises the problem of comparing the effective- 
ness of different algorithms. Unfortunately this problem has no simple 
solution. This is due to the necessity of choosing a criterion of effec- 
tiveness and the criteria may be diverse. For instance, we can take 
as. a criterion of effectiveness the accuracy of the result obtained, 
the time required for computing, the necessary storing capacity of 
the computer, etc. Also it is often necessary to use rather contra- 
dictory criteria in estimating an algorithn m. 

In selecting algorithms to be included in this book, the authors 
based their choice on the criterion of accuracy of the result and. the. 
rate of convergence of the iterative process. However, even with this 
limiting condition it is not possible to order all the algorithms in 
one and only one way and tell which of them is better or worse than 
another. The reason is that the estimate of the rate of convergence is 
not made for a particular problem, rather it is applied to a class of 
problems. Therefore an algorithm which is poor as applied to a broad 
class. of problems can prove effective on a narrower one. This makes 
it necessary for the calculator to keep a large reserve of algorithms 
and to apply them depending on the problem to be solved. 

it is important to know what ensures a fast rate of convergence of 
the algorithm. In practice, even the calculating of the first derivative 
of a function quite eften involves certain difficulties; these become 
insurmountable when trying to calculate the second derivative. 
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Therefore special stress is laid on the description of the algorithms 
that require the finding only of the first derivative or only of the 
value of the function. 

In describing the computational methods we consider only the 
finite dimensional case. This is due to two reasons. First, in using 
a computer for calculations, the problem is to be approximated any- 
way bv a finite dimensional one. Secondly, most of the known algo- 
rithms are comparatively simply generalized for the minimization 
of functionals without essential changes. This approach made it 
possible to make the book easily understood by a broad circle of 
readers, since in order lo grasp most of the results described only a 
knowledge of the principles of mathematical analysis and linear 
algebra is required. 

To avoid the necessity of frequent cross-referencing, not many refe- 
rences are given in the text. Short bibliographic notes follow some 
of the chapters. The authors did not attempt to comprise all the 
literature on the questions treated, this being simply impossible 
because of its vastness. This is why the list of literature given at 
the end of the book includes only papers and monographs directly 
used in writing this book. 

It should be noted that the authors have not discussed the methods 
of solving a broad and important class of noncorrect extremal prob- 
lems, which are treated in the works of A. N. Tikhonov and his 
followers. The authors have but slightly touched the solving of 
optimal control problems. These problems have been studied from 
various points of view and the methods for their solution are given 
in N. N. Moiseev’s monograph Numerical Methods in the Theory of 
Optimal Systems. 

The algorithms set forth below are iterative in character. This 
means that we can construct a finite or infinite sequence of points 
Z,, * =: 0, 1... which is said to converge to the solving of a mi- 
nimization problem. 

The points of the sequence are related by the equation 


Tht, == LR + ApPp 


where p,;, is the vector of shift from point z, and a, is a step along 
the direction of p,. Therefore the descriplion of any of the algo- 
rithms given below consists in establishing the method of choosing 
the vector p,; and the length of the step a,. It should be noted that 
the method of choosing the vector p, determines the general rate of 
convergence of the process and the method of choosing a; has an im- 
portant influence on the amount of calculations at each iteration. 
Therefore the authors’ aim was to give in all cases of choosing a@,, 
a method, such that the required value of a, could be found after 
a finite number of ilerations without affecting the general rate of 
convergence. 
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Let us briefly review the estimates of the rate of convergence, 
which are in most cases used in this book. 

We say that a sequence {z,} converges to point x, at a linear rate 
or at the rate of geometrical progression (with the ratio q) if from a 
certain & the inequality || 7,4, — zx, || <q || 7, —z, || where 0 < 
<q <1, issatisfied. If the inequality || 7,4, — zy || < gp || Tp — Ty] 
is satisfied. where g, — 0 with k — oo, we say that the rate of conver- 
gence uf lhe sequence {x,} is superlinear, or faster than the rate of con- 
vergence vf any geometric progression. If q, <C || rp — ry || ~9, 
then || rp4,—7y ||} <C || 2, — zy, ||?. This estimate is a character- 
istic of the quadratic rate of convergence. 

The above estimates will occur in this book also in several other 
equivalent forms. 

Some remarks on the notations used. 

As mentioned before, the subject is treated for the case of an n- 
dimensional vector space which will be denoted by £”. The vectors 
will be denoted by lower-case letters x, y, z, etc. and their components 
by using superscripts so that z' is the i-th component of vector z. 
The subscripts denote the elements of a sequence. Matrices are deno- 
ted by capital letters A, B, C etc. An asterisk as upper index denotes 
transposition, i.e. A* is the transposed matrix A. AS arule vector x 
means a column-vector so that z* denotes a row-vector. The scalar 
product of two vectors is denoted by (z, y), i.e. 


(zx, y) — > x'y’. 
i=] 


The norm of the vector is understood to be its Euclidean norm, 
unless otherwise specified: 


x |l==V (2, 2). 


In conclusion, the authors express their sincere gratitude 
to G. E. Lybarskaya, L. A. Sobolenko, E. I. Boguslavskaya and 
V. M. Panin for the invaluable assistance in preparing this book. 

Chapter I (except Sec. 5 and partly Sec. 2) and Chap. III (except 
Sec. 9 and partly Sec. 3) have been written by B. N. Pshenichny. 

Chapter II, the third and the fourth subsections of Sec. 2 and 
the fifth and sixth subsections of Sec. 3, and Sec. 9 of Chap. III 
have been written by Yu. M. Danilin. 
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CHAPTER I 


INTRODUCTION TO THE THEORY 
OF MATHEMATICAL PROGRAMMING 


This chapter describes some facts from the theory of convex sets 
and the necessary conditions of the extrema; these facts are neces- 
sary for understanding the matter set forth in subsequent chapters. 


1. CONVEX SETS 


In this section we consider the basic properties of convex sets in 
an n-dimensional Euclidean space. 


Definition. Separation Theorem 


Definition 1.1. A set of points X in E” is called convex if together 
with any X,, X_ € X it contains also all points of the form: 


r=dr,+(1—A)zx, OAT. 


In geometrical terms this means that if the end points of a seg- 
ment belong to a convex set X then the whole segment belongs to the 
set too. 

Lemma 1.1. The following statements hold: 

(1) The intersection of any number of conver sets is convex.’ 

(2) If ri EX, i=1,..., m, then with any A;, i = 1. .» m 


such that x A; = 1, A; SO, a point x = >} Miki belongs to X. 


The following theorem and its corollaries are the basic tools using 
which it is possible to obtain results characterising various pro- 
perties of convex sets. 

Theorem 1.1. Let X be a convex set, and X its closure. If point zo 
does not belong to X, then there exist a vectora € E”,a # O, anda num- 
ber ¢ > 0 such that for allxE€ X 


(a, x) < (a, 2) — €. 
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Proof. X is aclosed set, by definition. Let us show that it is convex. 
Indeed, if x € X, then there is a sequence {x,}, k=1, ..., such 
that x, € X, z, > =z. Now let xz, yE X, 0 <A <1. Let us prove 
that Az + (4 — A)y € X. Since X is a convex set, it follows from 


Try YREX, TR > 2, Yn > y that 
Atp +(1—A)y, EX, 
At, + (1 — A) yn, ~m Ac + (1 —A)y. 
This means that Az + (1 —A)yE X, i.e. X is convex. 
Let us take a point y € X whose distance from z, is the least, i.e. 
lz — 2 |>lly—a ll 2rEX. 
Since X is convex for all x€ X and O<iA< 1, we have 
Ac + (1 —A)y=ytA(e—yEX. 
Therefore 
l| Aw + (1 — A) y — 2p IP = Illy — to + A (x — y) IP 
=(y¥— 2% +A (t— yy), Y¥ — 2% +h (te — y)) 
= (yY — Xo, Y— Xp) + 2A(y — 2, E—yY) +M(xt—y, « — y) 
= |ly — 2 IPP + 2A (y — ao, te — y) +A |lxe—ylPSlly —Z||. 
The last inequality holds for any A, varying between zero and 
unity. Simplifying it we obtain 
2(y¥— 2%, r—y) +Allz—y |P SO; 
hence with A = 0 
(y¥ — 2%, z—y) 20. 


Let a = x) — y. The last inequality can tlen be written in the 
form (a, x) < (a, y). But 


(a, y) = (@, 2) — (a, tp — y) = (a, x) — || @ |p. 
Setting e« = || a ||", we finally obtain 
(a, x) < (a, Xo) — &. 


This inequality holds for any z € X. Besides e >0O as ty) € X and 
consequently y + xy). Therefore 


Ee = |la |? = |lzo — y |P > O. 
Q.E.D. 
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Remark. In proving theorem 1.1. we have proved at the same 
time that the closure of a convex set is convex too. As a simple 
exercise the reader can prove that the set of interior points of a con- 
vex set is convex too. 

Corollary 1.1. Let X be a convex set and x, the frontier point of X. 
Then there is a vector a0 such that 


(a, zr) < (a. Lo), xrEX. 


Corollary 1.2. /f X and Y are convex sets that do not intersect, then 
there is a vector a 3 Q such that 


(a, z)< (a, y), TEX, VEY. 
Corollary 1.3. Jf X and Y are closed convex sets which do not intersect 


and one of them is bounded, then there exist a vectora-Oanda 
number ¢ > O such that 


(a, x)<(a,y)—e, “EX, EY. 


Convex Cones 


Definition 1.2. A set K is called a convex cone if the set is convex 
and together with every point x € K it contains all points Ax with h > 0. 
It is clear that if z, y € K then xz + y € K. Im fact, since K is a 


convex set, point 5a + sy belongs to K. But 
4 1 
r-y=2 (sz+5y) ; 


whence xz + y € K by the definition of a cone. The most important 
properties of cones are formulated in terms which establish the rela- 
tion between the original cone and the cone that is its conjugate or 
dual. 

Definition 1.3. Let K be a convex cone. The set of all vectors y € E” 
satisfying for any x € K the inequality (x, y) => 0 is called a conjugate 
cone and denoted by K*. 

An elementary check shows that A* is also a convex cone. 

Lemma 1.2. K* is a closed convex cone. 

Lemma 1.3. Let K be a convex cone. Then x) € K if and only if 
(25, y) = O for all y € K*. If K is closed, then 


(K*)* = K, 


Proof. It is evident that if x, € K, then (zy, y) > 0 for all y € K*. 
Suppose it is false. Let (zx), y) =O for any y€ K*, but 2 EK. 
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Since K is a closed convex set and using theorem 1.1, we can assert 
that there is a vector a such that 


(a, L) < (a, t) — 2, cE K. 
Now a closed cone K always contains point 0. Therefore in particular 
(a, X) S —E. (1.1) 
On the other hand 
(a, z) > 0, cE K. (1.2) 


Indeed, if for a certain zx; € K (a, x,) <0, then since Az, € K with 
A> O 


(4, 4) SA (a, 4) —e 


and the last inequality must be valid for any A; this is impossible 
if (a, z,) <0. Thus (1.2) is valid and consequently a € K*. Then 
(a, X 9) = O and this contradicts (1.1). This proves the first part of 
the Jemma. 

Let us now prove its second part. If z € K, then (x, y) > 0 for 
all y€ K*, by definition, and therefore x € (K*)*, K c(K*)*. 
Conversely, by definition, z € (K*)* if and only if (z, y) >O with 
any y € K*. However, it was proved above that in this case z € K, 
i.e. (K*)* CK. Thus (K*)* = K. Q.E.D. 

Polyhedral cones are an important class of cones encountered 
in the theory of linear programming. 

Definition 1.4. A cone K is called polyhedral if there exists a finite 
set of n-dimensional vectors a;, i= 1, ..., m such that with x€ K 
the expansion 


t= d Nidi, Agoe0O, i--1, ..., m (1.3) 
4=—1 


is valid and conversely (1.3) implies that x € K. 

Thus a polyhedral cone K is a set of points which can be repre- 
sented in the form (1.3). A given point x € K in the form (1.3), 
speaking generally, is represented not uniquely. 

Lemma 1.4. Let x € K, K being a polyhedral cone. Then there is such 
an expansion of x in vectors a; with nonnegative coefficients 4;, that 
the number of indices i for which X; >> 0 does not exceed n, the number 
of dimensions of the space; the vectors a; corresponding to nonzero A; 
are linearly independent. 


Proof. Let x € K, i.e. xz = » \,a;, and J be the set of those indices i 


1=1 
such that A; > 0. Suppose that the number of elements in 7 is 
greater than n, or does not exceed n, but the vectors a;, i € J, are 
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linearly dependent. Since more than n linearly independent vectors 
cannot exist in an n-dimensional space, there are coefficients a;, 


not all zero, such that ») a;a; = 0. Besides, by definition of 7, 
icy 
A; =0 if i€ J and so 


x= >) hai, A; > 0, iC TS. 
ie 


Subtracting from this relation the preceding one multiplied by e, 
we obtain 


ey 
Without loss of generality we can take that a@;>>0, for some i€ J. 
Setting €¢ = min Ai and 4; =A;—e&)%;, we have 
ic, a.>0 Mi 
XL= »} \;Q; 
ie 


where 4; =O and for one i at least A; = O. 

Thus we have obtained an expansion of z in vectors a; with non- 
negative coefficients; however the number of strictly positive coef- 
ficients has been diminished. 

This process can now be applied further until the number of non- 
zero coelficients becomes less than n or equal to n and vectors a; 
for which A; > 0 become linearly independent. Since we have a pro- 
cess of diminishing a whole number, this process obviously cannot 
be continued infinitely and after a certain number of steps we shall 
get an expansion which satisfies the conditions of our lemma. 

Lemma 1.9. A polyhedral cone is closed. 

Lemma 1.6. Let the cone K be defined by a system of linear inequalities 


(a;, rv) >O, i=1,...,m 


where a; € BE”. Then the conjugate cone K* is a polyhedral cone and 
consists of points y, which can be presented in the form 


™m 
y= Di Adi, M0, i=1, ..., m. 
i=1 
Proof. Let us consider the cone 


m 
K=l{y:y= >j Mai, M:ce0, i=1,..., m,. 
i=1 
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By definition, K* is a set of points xz, for which (z, y)>0, yCK ; 
i.e. (x, ») 4;a;)>>0 for all A;>>0. Then 
i=1 


(x, »> Nia: } = >) Ai (x, a;) >> 0. 
i=1 i=1 


The last inequality can obviously be valid for any A; > 0 only if 
(a;, z) >> 0,i=1,..., m, ie. if 2€ K. Thus K* = K. Since K 
is a polyhedral cone, it is closed and by lemma 1.3 (K*)* = K. 
Thus K* = K. Q.E.D. 

Remark. The lemma proved above is known as the Farkas-Min- 


kowski lemma and is used as the basic tool for obtaining the neces- 
sary conditions for extrema. 


Strictly and Strongly Convex Sets 


Definition 1.5. A set X CE” is called strictly convex if for any 7, 
to EX, tx, all points of the form 


Az, + (1 —A)z., VUxcA<l 


are internal points of this set. 


Definition 1.6. A set X cE” is called strongly convex if there is 
a constant y > 0 such that any point 


r4-+ Xo 
t+ yEX 


if %, 2E€X and |ly || <y |lz. — % II. 
It is easily ascertained that a strongly convex set is also strictly 
convex (but not the converse). 


2. CONVEX FUNCTIONS 


Convex functions have a number of important properties and 
constitute one of the main objects of study in the theory of mathe- 
matical programming. The problem of convex programming which 
is the most investigated one for extrema is formulated in terms of 
convex sets. However convex functions play a decisive role in the 
general nonlinear problem too, since the sufficiently general and 
comprehensive necessary conditions of extrema can be formulated 
only for the case where the derivatives of the functions in the direc- 
tion at the given point are convex functions. 

We shall mainly study convex functions defined over the whole 
space so that the value of any given convex function is finite at each 
point z € E”. From the viewpoint of general theory it is sometimes 
expedient to consider convex functions which can at some points 
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take the value of -++oo. In what follows however such functions will 
occur only in studying dual problems of convex programming. There- 
fore in all statements of this section, unless otherwise specified, we 
suppose that the convex function under consideration is defined over 
the whole space £” and takes finite values. 


Definition. Basic Properties 
Definition 2.1. A function f (x) defined for all x € E" is called con- 
vex if for any 2, x, and Ay, A, =O0, A, +A, = 1, 
f (Age, + Age) < Agf (21) + Acf (72). 
Remark. If f (x) = -++oo for some z, the definition remains valid. 


Lemma 2.1. Let f, (x) and f, (x) be convex functions and c,, c. non- 
negative numbers. Then 


f (x) = eyf, (x) + Cofe (2) 


is a convex function too. 
Lemma 2.2. Let f; (x), i= 1, .., m be convex functions. Then 


f(x) = max fi (x) is also a convex : function. 
Lemma 2. 2.0. If f (x) is a convex function, then we have 
lf (Agty + Agte +... + Amz) 
< Af (1) + Aof (2) +. - + Amf (2m) 
for any nonnegative i;, which satisfy the condition 
Ay t..e than = 


Proof. With m = 2 this statement follows from the definition of 
a convex function. Suppose the lemma has been proved for m < k. 
Let us show that the statement is valid for m= k + 1. Let A; > 0, 
i=1,...,k4&+1,4,+...+ Age, = 1. Evidently one can con- 
sider all A; to be strictly greater than zero; otherwise we should 
have the case where the above inequality is satisfied by hypothesis. 

Thus Apt > O and 4 — Mpg = A + o 8 + hp, > 0. 

From the definition of a convex function we have 


f (Ageia +... + Ant, + Nn+12R41) 
4 
<(t—dnaa) f (q@ap— y+ . ae 
But by induction 
Ay 
f(=—*— 1— Anat w+. .+>— ha zp | 


aren f(t) +.--+ i f(z.) (2.2) 


+ — atx ) + Antif (Tati). (2-1) 
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since 


Comparing (2.1) and (2.2) we obtain the required result. 
The lemma has been proved using the principle of mathematical 
induction. 


Lemma 2.4. The function f (x) is convex if and only if for any zx 
and p € E™ the function of the one-dimensional variable t 


Px, p (t) =f (x + tp) (2.3) 


is a convex function. 


Differential Properties 


Let f(z) be a convex differentiable function whose continuous 
oeradient is f’ (z). 

Lemma 2.9. The following statements are equivalent: 

(1) f (x) is a@ convex function. 

(2) f (z2) — f (m1) S> (f (%1), Zz — %) for any x, rT. € E". 

(3) (f’ (c + Ap), p) is a nondecreasing function of X. 

If f (x) is a twice continuously differentiable function, then 

(4) f” (x), the matrix of second derivatives, is positive definite, i.e. 
(f" (x) p, p) 2 O for any x, p € E”. 

Proof. Note first of all that if 

Px, p (A) = f (x + Ap),’ 

then as shown above @,, p (A) is a convex function and 


Px, p (A) = (ff (x + Ap), P), Px, p (A) = (p, fF (& + Ap) p). (2.4) 


Let us show that statement (2) follows from statement (1). In fact, 
since 


f((i —A) a + Ate.) < (1 —A)f (my) + Af (z.), VO<Act 


we have 
\ (r> —1s)) — 
f (zy +A (z2 , x4)) — f (x4) <f (2) —f (x). 
Taking the limit with 4 —O we obtain 


(f° (4%), 2%, — 2%) <f (x2) — f (%). 


Thus statement (2) follows from (1) or shortly (1) — (2). 
Let us show that (2) — (3). From statement (2) we have for g,, p (A): 


Px, p (Ay) (Ae — An) S Px, p (Ae) — Px, p (An), 
Px, p (Ag) (Ay — Ae) S Px, p (An) — Px, p (Ag). 
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The two inequalities with A, > Aq give 


, Px, p (A2)— Px, p (A4) , 
P's, p (Aa) <i, v (ha) 


ie. (ff (@ + Ap), p) <(f (e+ Aap), p)- Q-E.D. 

(3) —> (1). Let (f' (x + Ap), p) be a nondecreasing function of A. 
Then 92, p (ha) < 9%, p (Ag) With Ay > Ay. 

fompw< 1, then 


1 
O<p (bo— Ma) | 19%,» (a+ (ha — Ma) — Oe, » (An + tH (Ae—Aa))] ot 
0 
= (1— Bp) Px, p (Ar) +P (Az) — Px, p (Lp) Aa + pg), 


i.€. Px, p (A) is a convex function of A. Then as follows from lemma 2.4, 
/ (x) is a convex function. 

(3) — (4). Since 3 p (A) = (f' (c + Ap), p) is a nondecreasing 
function, @x, p (A , 1.€. 


(Dy f’ (x + Ap)p) > 0. (2.5) 


Hence the matrix f” (x) is positive definite. 

(4) + (3). Conversely, if (2.5) is satisfied, then g,,p (A) is non- 
negative and consequently the function @,, p (A) = (f° (a + Ap), p) 
is nondecreasing. 

Since it has been shown that (1) — (2) — (3) (1), (4) ~ (8) and 
(3) —> (4) the equivalence of all of the four statements in lemma 2.5 
is proved. 

Corollary 2.1. The quadratic function 


f(x) =F, Ax) +(b, 2) 


is conver if and only if matrix A is positive definite. 

Indeed, f (x) is twice continuously differentiable and f” (x) = A. 
Therefore the statement of the corollary follows directly from sta- 
tement (4) of lemma 2.5. 

Lemma 2.5 provides a series of criteria of convexity of a func- 
tion which enable us to establish whether a given function is convex. 

Definition 2.2. Let the convex function f (x) be defined at point xp 
and have a finite value. Vector g is called a subgradient or support 
vector for function f (x) at point x, if for any x the inequality 


f (x) — f (0) & (8, © — 2p) (2.6) 


is satisfiede 

It can be shown that if f (x) is continuous at point z,, then at this 
point there exist subgradients and the set of these subgradients is 
convex, closed and bounded. It follows from lemma 2.5 (statement 2) 
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that f’ (z») is a subgradient of function f (x) at point x, if f (z) is 
differentiable. Thus the concept of subgradient is a generalization 
of the gradient concept. 

It is clear from the definition that if g, and g, are subgradients of 
convex functions /f,(z) and f, (x) at point z), then c.g, + cog. is a 
subgradient of function cf, (z) + cof. (x), ¢,, c, = 0. Thus know- 
ing the subgradients of certain convex functions it is easy to com- 
pute the subgradient for their linear combination. 

Now let f (z) = max fi (x), where f; (x) is a convex function, 


and let g; be subgradients of f; (x) at point z,. Then vector 
g= p2 MiGi 


where DE =1, A;S0, i=1, ..-,,m, A= if fi (Xo) <f (2%), is 


a subgradient of function f (zx). 


Strictly and Strongly Convex Functions 


Functions for which the condition of convexity is satisfied in a 
strong sense play a very important role in mathematical program- 
ming. 

Definition 2.3. Function f (x) is called strictly convex if 


f((i —A)a+ay)< (1 —A)f(z) + Af), IO<A<il, 
zs Yy. 


If a strictly convex function is sufficiently smooth, then statements 
similar to those formulated in lemma 2.5 are valid for it. 
Lemma 2.6. The following statements are equivalent: 
(1) f (x) is a strictly convex function. 
(2) f (2) — f (tu) > (fF (41), 22 — %) for any x, Ly € EH", z, F ap. 
(3) (f° (c + Ap), p) is a strictly increasing function of 2% 
Definition 2.4. Function f(x) is called strongly convex if for 
any 24, X_ € E” 


f (FE?) << f (@) +f evil ea IP (2.7) 


where y > is an arbitrary small constant. 
A strongly convex function as can be easily ascertained is also 
strictly convex, but, speaking generally, the converse does not hold. 
In what follows we shall consider twice continuously differen- 
tiable strongly convex functions. 
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Lemma 2.7. If f (z) is a twice continuously differentiable function, 
then the condition of strong convexity (2.7) is equivalent to the condition 


(f" (xz) p, p) 2>m\||pl|P, m>O, (2.8) 


for any x and p€ E". 
Inequality (2.8) implies that matrix f” (x) is strongly positive. 
_ Sprotery 2.2. A_ strictly convex quadratic function f(z) = 


5 (Aa, x)+(b, x) defined over the space E™ is strongly convex 


too “and the converse is valid. 
Proof. It is necessary to prove only the first statement. 
From (2) of lemma 2.6 it follows that for any z = 0 


(Az, x) > 0. (2.9) 
At the same time 
(Ax, 2) > A(z, 2) = Az lP (2.10) 


where A is the least eigenvalue of the matrix of second derivatives, 
A. From (2.9) and (2.10) it follows that 7 > 0 and f (z) is a strongly 
convex function. 

Let z, be an arbitrary point in &”. Consider the set 


Y = {z: f (2) < f (25)}- 


Lemma 2.8. If f (x) is a twice continuously differentiable strongly 
convex function, then Y is a closed bounded strongly convex set. 

Proof. The set Y is closed since f (x) is a continuous function. 

Let us prove that Y is bounded. By Taylor’s formula 


f (2) =f (20) + (f" (x0), ©— 20) +> (f" ©) (wx — 20), 2 —2y) 
where E=2,+06(z— 2p), 8E[0, 1]. Using (2.8) we have 
f (20) Sf (2) SF (ao) + (f" (Xo), T— Xo) + > Ile — =o IP. 
Hence 


+ || —2ol?+ (f' (vo), — 40) <0, 
1.e. 
= \|z—2¢ I?<| (Ff (20), r—2Xp)|<IIf" (Zo) {I || Z— Xo || 
or 
| — ao || QtE et 


This last inequality proves that Y is bounded. 
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Finally, let us establish that Y is a strongly convex set. Let x, 
xz, € Y. Using Lagrange’s formula and condition (2.7) we obtain 


f(2S2+y) =7(252)+0'O.) 
<F lf (ea) +f (eo) —vIlt1—zelP+M Iyll (2.41) 


where ET Oy, Q¢€[0, 1], AV is the maximum value of /’ (z), 
the derivative on the set Y. 

Setting f(x) >f(x.) we have — [f (21) +f (x2)I<f (a). IEllyll< 
SF || 22-21, then from (2.11) it follows that f(52+y)< 


<f (2), i.e. Peete y CY, By definition this means that Y is a 
strongly convex set. 


The lemma is proved. 

Remark. The set Y remains closed and strongly convex also in 
case f (z) is a differentiable or continuous strongly convex function. 
The proof of Y being strongly convex is based on the fact that a con- 
tinuous strongly convex function satisfies in every bounded set 
Lipschitz’ condition (see N. Bourbaki). 

Lemma 2.9. /f the matrix f" (x) satisfies condition (2.8), then there 
exists the inverse matriz f"~1 (x) and also 


n 4 
(f°* (2) P, P)<— || PIP. 
If moreover the matrix f" (x) is bounded, i.e. 


(f" (z) p, Pp) < M || p|P (2.12) 
then 


mm 


(f° (2) Py P) = Fr |l PIP. 


Concave Functions 
Definition. /f for any x,, x, € E” and anyO<A< 1 the inequality 
f (Aa, + (1 — A) xq) S Af (a) + (1 — A) Ff (x) 


is satisfied, then the function f (x) is called concave. 

It follows that the function f (x) is concave if and only if the func- 
tion —f (x) is convex. Taking this into account all the properties 
of concave function can be obtained by a simple reformulation of 
the corresponding properties of convex functions. 
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In a way analogous to that used for convex functions, strictly and 
strongly concave functions can be defined and their properties stu- 
died. 


do. CONVEX PROGRAMMING 


The subject matter of convex programming is minimization of 
a convex function in a convex domain. Convex programming is the 
most elaborated part of mathematical programming. 


Formulation,of the Problem, 
Basic Properties 


Given a convex continuous function f (x), z € Z”, defined for all 
xz € EH”, and a convex set X. It is required to find the minimum of 
j (x) in the set X, i.e. to find point z,, such that 


of (te) S F(z), TEX 


Lemma 3.1. A convex continuous function f (x) attains its minimum 
in a compact convex set X. 

Proof. The hypothesis is just a particvlar case of the well-known 
Weierstrass theorem which states that a continuous function attains 
its minimum in a compact set. 

Lemma 3.2. Let X be a closed set and f (x) a twice continuosly diffe- 
rentiable strongly convex function. Then f(z) attains its minimum in X. 

Proof. Let zx, € X. Consider the set 


Y = {z: f (2) < f (%)}. 


By lemma 2.8 it is closed and bounded. Consider now the inter- 
section X () Y. Obviously, if xz, is the minimum of f (z) in the set 
X (\ Y, then this point is the minimum point of f(z) in X as 
well. But the set X (| Y is bounded and closed being the intersection 
of two closed sets one of which is bounded. Therefore f (x) attains 
its minimum in X{()Y and consequently in X as a whole. 

Convex and strictly convex functions can fail to attain their 
minimum. 

Lemma 3.3. A set of points X, CX at which the convex function 
f (x) attains its minimum in X is convex. 

Lemma 3.4. A sirictly convex function attains its minimum in a 
convex set X at one and only one point. 
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Proof. Let z, and zx, be different points of minimum of f (z) in X. 
Then 


{(patze)<ziatzi =f), 


{ 4 
ZU ty EX. 


This contradicts the fact that z, is a point of minimum of f (2). 


Necessary Conditions for a Minimum 


Let f (x) be a continuously differentiable convex function and X 
a convex set. We have to consider the following question: if z, is 
the minimum point of f (xz) in X, what conditions are to be satisfied 
at this point? 

Definition 3.1. Let xz) € X. We denote by K (zo) a set of vectors p 
such that p € K (29) if and only if there isan a > QO such that zx, + ap € 

xX 


The set K (zo) is called the cone of admissible directions for X at 
point Zo. 

Lemma 3.9. K (29) is a convex cone. If p € K (29) and xo + Gop € 
EX, then t + ap€ X with anyQD/<a< aq. 

Theorem 3.1. Let z,, be the minimum point of a continuously diffe- 
rentiable convex function f (x) in a conver set X. Then 


f’ (ty) € K* (2). (3.4) 


Conversely if (3.1) holds, then x, is the minimum point of f (x) in X. 

Proof. Let (3.1) be satisfied at point z,. Then (f’ (t,), p) = 0, 
PCA (ay). Further if z€ X, then p = x — zy € K (ay) for z,+ 
+ (x — zy) = zt € X. Therefore 

(f (te), T—X%y) 20, TEX. 
By lemma 2.9 we have for a convex function 
if (x) —f @e) 2 (Ff (fe), — Ty). 
Hence 
f(z) — f (te) 20, EX 

this shows that z, is the minimum point of f (x) in X. 


Let us now prove that condition (3.1) is necessary. Let z, be the 
minimum point. Then for any z€ X and A, O0<A< 1, we have 


f ((1 — A) tq + Ax) = f (te +A (% — Fe) > f (Ze) 
or 
f(te+A a (7) >0. 
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In the limit with ~ —O we obtain 
(f’ (ty), © — ty) SO, EX. (3.2) 
Let now p€ K (xz,). Then z, -ap=2xE€X,a>0)0, or 


1 
p 7 (x — Ly) 
Then by (8.2) and with a>0, we have 
r 1 orgs 
(f' (4), py =— (f' (2,), —2,) 0. (3.3) 
Inequality (3.3) is valid for any p € K (az,). Hence 
f (ty) € K* (ry). 
Corollary 3.1. By theorem 3.1 point x, is the minimum point of 
f (x) in X if and only if the inequality 
(f° (Ly), T— Ty) 29, LEX 
is satisfied. In fact, as just shown, (3.2) is equivalent to (3.1). 
Let us show how theorem 3.1 is applied to the case where domain X 
is defined by a system oi linear inequalities. 
Given vectors a; € E", i€ J-\V J°, where J- and J°® are 


finite sets of indices, and corresponding numbers 0;. Let domain X 
be defined by a system of equalities and inequalities: 


(dj, x) —_ b; < 0, L C J -, (a; , LZ) — b; — Q, L EC JI, (3.4) 
Let us describe a cone K (z,) at an arbitrary point z) € X. We set 
J~ (Xo) = {iz (a;, Zp) — Bb; = 0, TE STH. 

By definition, p € K (z,) if z) + ap € X with sufficiently small a. 
It is clear that z, + ap € X, i.e. that point x) + ap satisfies (3.4) 
with small q@ if and only if 

(ai, P) <= Q, l € J (2), (a; ’ Pp) — Q, L € J°. (3.9) 


Thus cone K (z,) is described by system (3.5) which we can rewrite 
in an equivalent form: 


(—4a;, P) = Q, l C J - (79). (a; ? Pp) = Q, l E J, 
(—a;, Pp) = 0, l E J, 
By lemma 1.6 vector y € K* (z,) can be presented in the |form 
y= >» —wa;+ >) —u"aj;t+ D) wai, 
ic Y— (xo) i€y0 1€,70 
where u'‘, u**, u-* are nonnegative numbers. Denoting u’=u**—u™, 
i€ J® we obtain 


y=— > wa— >) wa, u'>0, i€F- (x). (3.6) 
7€ J— (x0) 16 70 
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Theorem 3.2. Let f (x) be a convex differentiable function and set X 
be defined by system (3.4). Then for point x, to be the minimum point of 
f(x) in X it is necessary and sufficient that there exist numbers wu’, 
i€ J- U J°® such that 


f (2,) ++ >) u'a; —0, u'>0, iC T, u* —0, 
ig WU GO 


if (Q;, Z,)—6; <0, iC S-. 


Proof. The result is obtained directly by using theorem 3.1 and y 
in the form (3.6) for elements of K* (x,) and also taking u' = O for 
i€ J- (Ly). 

Corollary 3.2. For point x, to be the minimum point of the convex 
differentiable function over the whole space it is necessary and sufficient 
to satisfy the equality 


f’ (Ly) = 0. 


Corollary 3.3. For point x, to be the minimum point of the convex 
differentiable function in the set 


z>0, jc¥H, 


where ¥ isa subset of the set] = 1,2, ..., n, it is necessary and suf- 
ficient to satisfy the relations 


TY S0 if 27 =0, IF, 
0x4 

Of (Xx) __ 
Ox) 


0 if 27 ~0 or jc#. 


The Kuhn-Tucker Theorem 


The necessary and sufficient conditions for a minimum considered 
above were based on an abstract description of an admissible set X 
in which function f (xz) was minimized. In a broad class of problems 
the set is defined by a system of inequalities and equalities. This 
section considers the necessary conditions for a minimum in this 
concrete case. 

Given convex functions f; (x), i = 0,1, ..., m and convex set X. 
It is required to minimize f, (x) with the following constraints 


fi (x) <0, i=1,..., m, rE X. (3.7) 


Theorem 3.3 (Kuhn-Tucker). Let x, be the minimum point of 
fo (x) with the constraints (3.7) and let there be a point x, € X such that 


fi (mj) <0, i=, ..., m. 
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Then there are numbers u' >0, i = 1, ..., m such that 


fo (£4) + py uf (24) < fo (x) +4 uf (xz), «xEX, 
u‘f; (te) = 0, i=1,..., m. (3.8) 


These conditions are necessary and sufficient. 
Definition 3.2. Numbers u* used in the theorem are called Lagrange 
multipliers. 


Dual Problem 


Consider again the problem of minimization of convex function 
fo (z) with constraints (3.7). Let u' >0, i=1,..., m be fixed. 
Let us compute 


p (u) = int [fo (2) + > u'fi (@)I. (3.9) 
xEX i=1 


Thus function @ (u) with u > 0 has been determined; it can take 
the value —oo as well. We leave it to the reader to prove that 9 (uw) 
is a concave function. 

Theorem 3.4. Let u > 0 and x satisfy the constraints (3.7). Then 


@ (u) = fo (2). 


If however the conditions of theorem 3.3 are satisfied, then 
max @ (wu) = min fp (z) 
u>0 x€D 


where D is a set of points x which satisfy (3.7). 
Proof. For x€D, us0 we have 


@ (U)< fo (2) + 2 uf; (2) < fo (2). 
Let now the conditions of theorem 3.3 be satisfied. Then there 


exists a vector uy > 0 such that the relations (3.8) are satisfied for 
it. These relations imply 


(Uo) = fo (4) + Dy uift (2) =fo (te). 


and since @ (uw) < fy (xy) it follows that vector u, provides the maxi- 
mum of function @ (wu) in the domain u > O and 


max p (iu) = (Uo) = fo (Z,) = MIN fo (Z). 
u>0 x&éD 


Q.E.D. 
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The problem of maximization of @ (u) with the constraint u > 0 is 
known as the dual problem of convex programming and u as the vector 
of dual variables. 

The essence of theorem 3.4 can now be interpreted as follows: 
under the conditions of the Kuhn-Tucker theorem the value of the 
maximum of the objective function in the dual problem is that of 
the minimum of the objective function of the primal problem. The 
Lagrange multipliers of the primal problem are at the same time the 
solution of the dual problem. 

The problem of convex programming often arises in the form of 
the minimization of f, (x) with the constraints 


fii@ <0,i€7-, fi(z)=0, i€ 57%, xEX (3.10) 


where J- and J° are finite sets of indices, fp (x) and f; (x), i€ J-, 


are convex functions of z, f; (zx), i € J® are linear functions and X 
is a convex set. 


The dual problem for this case is formulated as a maximization 
problem of gm (u) with the constraints u’ > 0, i € J- where w has 
components u', i€ J- JF and 


p(u)=inilf(z)+ > uf; (2)). (3.11) 
xEX 1 7-U 70 


Thus the number of dual variables is equal to the number of con- 
straints (3.10) and the variable wu" corresponding to the i-th constraint 
takes nonnegative values if it corresponds to an inequality constraint 
and arbitrary values if it corresponds to an equality constraint. 


Problem of Linear Programming 


The problem of linear programming is the problem of minimization 
of the function fy (x) = (a9, x) subject to constraints (3.4) 


(a;, x)— 6b, <Q, i€S-, (Gis xz) —b,; =), ic J®. 
This problem coincides with the problem (3.10) if 
hi (2) = (Qty x) —_ bi» xX < Et", 


Lemma 3.6. If constraints (3.4) apply together, then the problem 
of linear programming either has a solution xz, or the value of the lower 
bound fy (x) = (ao, x) with the constraints (3.4) is —oo. 

The proof of this lemma can be found in textbooks on linear pro- 
gramming. 

The necessary conditions characterizing zx,, the solution of the 


problem of linear programming, are obtained just by reformulating 
theorem 3.2 since f, (x) = dy. 
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Theorem 3.5. /n order that point x, be the solution of a problem of 
linear programming it is necessary and sufficient that there be num- 
bers u', ti€ J~ U J® such that 


ay + » wa;=0, w>0,icF-, ui—0, (3.12) 
iJ US” 


if (a, L,)—bi <0, iC T-. 
Let us construct the dual of the problem of linear programming. 
By definition, 


9 (u) = inf fo (a) + ty uf (2) 


xEE™ 
= inf . 24 a u' ((a;, x) — b;)] 
xEE” J° 
= inf (Cao + x. “via x) — »2 u"d;] 
xEE”™ ieJS- UT 
— "S ud; if Qo + » ua; =Q, 
iJ“ UT” ieJ-US® 
— oo if a+ >) wa, -~0. 
ieJ-UT® 


Thus the dual of the problem of linear Programming, i.e. the 
problem of the maximization of p(u) with u'>0, i€ J-, is equiva- 
lent to the maximization of 


—- >» ud; (3.13) 
ic J US? 
with the constraints 


a+ > wa,=0, uwD>od, ices. (3.14) 


Theorem 3.6. If the primal problem of linear programming has a 
solution, then the Lagrange multipliers are the solution of the dual prob- 
lem, and at the same time the value of the minimum of the objective 
function of the primal problem is equal to the value of the maximum of 
the objective function of the dual problem. 

In addition to constraints (3.4), problems of linear programming 
often contain constraints of the type 


x >0, je# (3.4’) 
where ¥ is a subset of the set j = 1, 2, ..., n. Using theorem 3.6 
the reader can easily prove the following. 


Theorem 3.7. /f a problem of linear programming with constraints 
(3.4), (3.4’) hasa solution, then the Lagrange multipliers corresponding 


EEE eee 
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to constraints (3.4) are the solution of the dual problem: the maximiza- 
tion of 
— » u'd; 

icT-US® 

with constraints 
ait Dd waidd, ic#, 
ie I~ UT 
a3 + > u‘ai = 0, an u'>0, iE Ss”, 
ieJ- US" 

where a’, is the j-th component of vector a;. The values of the mini- 


mum of the objective function of the primal problem and of the 
maximum of the dual one coincide. 


Problem of Quadratic Programming 


The problem of quadratic programming consists in the minimi- 
zation of the quadratic function 


fo (2) =—-(, Cz) +(d, 2) 


with constraints (3.4); here C is an mxXn symmetric, positive 
definite matrix, and d is an n-dimensional vector. 

Lemma 3.7. [na problem of quadratic programming the lower bound 
is either attained or is —oo. 

The proof of this lemma will be omitted. 

Theorem 3.8. In order that point zx, be the solution of a problem of 
quadratic programming, it is necessary and sufficient that there be 
numbers ui, i€ J- Uy J°® such that 


Cz,+d+ >) uia; =Q, 
icJ-US° 
ui = 0 if (a;, r,)—b:<0, iC J; ui > 0, iC S-. 
The hypothesis can be proved by directly using theorem 3.2. 
Let now matrix C be strictly positive definite, i.e. there is a y 


such that (x, Cz) > y || z ||?. In this case matrix C is nonsingular 
and has an inverse matrix C-!. Let us construct the dual problem: 


p(u)= inf [fo(z)+ D>) v'fi (z)) 
xeE” ieJ-US? 


— inf [ = (2, Czr)+(d, z)+ » u* ((a;, x) —bi) | 
*€E ieJ UT” 
=inff[— 3) wo+>(, Cz)+(z,d+ >) wai) ] 
CE te I-U ic UM 
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Equating the derivatives of the right-hand side to zero, we 
find that the minimum is attained with 


z(u)y=—C3(d+ >) uta;). 
ieJ-“US® 


At the same time 


o(uy=— DI u'bs 
ieJ-UT® 
—(d+ Sula, c+(a+ Si utar)). (8.45) 
ic -UF® ies US? 


Thus the dual problem consists in the maximization of (3.15) 
with constraints u' >0, ic J-. 

Theorem 3.9. If the minimum in the problem of quadratic program- 
ming is attained and matrix C is strictly positive definite, then the 
Kuhn-Tucker theorem and theorem 3.4 are valid for the problem of 
quadratic programming. In this case the Lagrange mulipliers of the 
primal problem are the solution of the dual problem, and if uy is the 
solution of the dual problem, then the solution of the primal one can 
be found by the following formula: 


z(u)y=—C1(d+ > w'a,). (3.16) 
ie J US? 


4. NECESSARY CONDITIONS FOR A MINIMUM 


The general problem of mathematical programming consists in 
minimizing function f, (x), z € &” in a set defined by a system of 
equalities and inequalities 


fi (x) < Q, i E J, fi (x) — 0, l € J, L E XxX, (4.1) 


where J- and J° are finite sets of indices. In this section it is al- 
ways assumed that f; (x) are continuously differentiable functions 
whose gradient is f; (x). No assumption is made about set X for the 
present. 

The main object of this section is to deduce the necessary con- 
ditions which must be satisfied at point z, providing the minimum 
of fy (xz) subject to constraints (4.1). 


Basic Definitions 


Definition 4.1. A set D of points which satisfy constraints (4.1) is 
called an admissible domain. 
We assume that this set is nonempty. 
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Definition 4.2. Function f, (x) being minimized in D is called an 
objective function. 


Definition 4.3. Point x, satisfying (4.1) for which 
fo (Te) <fo (x), TED, 


is called the minimum point. 


Definition 4.4. Point xy is called a point of local minimum of fy (x) 
in D if there is a neighbourhood Q of point x, such that 


f (ty) Sfo (x), xcED  Q. 


In what follows the problem of the minimization of f, (x) will 
generally be considered. Obviously, the problem of the maximiza- 


tion of a function f(z) in D is reduced to that of the minimization 
in D of the function f, (xz) = —f (2). 


Necessary Conditions for a Minimum 


Definition 4.5. Vector p € E” defines the admissible direction with 
respect to set X at point x, € X if for any vector e; CE", i€ J° and 
any function r°* (A), i € J° which satisfies the condition 


rt (A) 


lim —— = 0 . 
im (4.2) 
the expression 
29+ Ap+ 4, r’ (A) ei: EX (4.3) 
2€ 


is valid with sufficiently small 1 > 0. 


The basic result to be proved in this section can now be formu- 
lated. 


Theorem 4.1. Let z, be a point of local minimum of f, (x) in D. 
Besides, let the set of admissible directions with respect to set X at point x, 


form a convex cone K (ry). Then there are numbers u®, u' € J- |) J 
such that 


ufi(z,)+ >) ubfi(z,)€ K*(z,), 
itJ-US® 
u'fi(z,)=0, i€F-, wid0, i=0, ie€T-. (4.4) 


Proof. Consider two cases. 


(1) Vectors f; (x,), i€ J are linearly dependent. Then there are 
numbers u', i € J such that 


») u' fi (z,) =0. 
40 
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Taking uw° = 0, uv’ = 0, i € J- we see that all the conditions of 
theorem (4.1) are satisfied. 

(2) Vectors fi (z,), i € 7° are linearly independent. Then there 
are vectors e;, i € J® such that 


(fi (Ly), é3) — Si; i, JE J° 


where 6;; = 0 if i=4j and 6,;; = 1. 

Let the total number of indices i in the set J- |J 7° be m. Con- 
sider set Z in space E™*!, the set being defined as follows. Vector z 
belongs to Z if and only if there is a vector p € K(a,) such that z’ = 
= (fi (zy), p) if fi (ve) = 9, T€F- U J or i =O. The compo- 
nents z' (of vector z € Z) for which f; (z,) <0 are arbitrary. Since 
K (z,) is a convex cone it is easily seen that Z is a convex cone too. 

Let us now define the set P. Vector w belongs to P if and only if 


w' < 0 with f; (zz) = 0, i€ J-ori=O0 
w' = 0 with ice 7°. 
The remaining components of vector w are arbitrary. Obviously, P 


is a convex set too. 
Let us demonstrate that Z and P do not intersect. Suppose that 
the opposite is true. Then there must be a vector py) € A (z,,) such that 


(fi (te), Po) <0 ifi € J~ and f; (z,) = 0 


and 
(fi (tx). Po) =O with 1€ J®. (4.9) 
We now construct a system of equations in functions r'’ (A), i € 7°: 
fi (z+ Apo+ 4, rie:)=0, i€ 5% (4.6) 
7€ 


Let us denote 
Bi (A, ry=fi (a, + Apo+ drew i€ J®. 
i€ 


Then from (4.6) we have 
g(a,r)=90, i€ J° (4.7) 
which defines r‘ as implicit functions of 4. Since it was assumed that 


f; (x) are continuously differentiable ’functions, the functions g; (A, r) 
are also continuously differentiable in 4 and r*. Then using (4.9) 


we write 


SE = (fi (@e)s Po) =0, iC T°, (4.8) 
0.0 | (Fi (4) €3) = 51s. (4.9) 
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6gi (0, 0) 
Or] 
i, j€/°. By the theorem on implicit functions, system (4.7) can 


be solved for r with small 4 if matrix 2% is nonsingular. In 


Let us denote by = —£ a matrix whose components are 


or 
this case r(A) is a differentiable function of A, r(0)=0O and 
Ty O6g\-!1 dg 
(M=—(F) a (4.10) 
Og; (O, 0) 
where ao is a vector whose components are ——— , In the case 


under consideration 


Og Ug 
3 =4, 3 =0, (4.11) 
where J is an identity matrix. This follows from (4.8) and (4.9). 
Thus we see that with small A continuously differentiable func- 
tions r' (A), i€ J° are defined. From (4.10) and (4.11) we have 


lim rt (A) —r# (0) lim of) _ ri (0)=0 (4.12) 


Let now zx (A)=2y + Ap, + oy m (A) e,. Then 2 (A)€ X with 


small 4 > 0 by the definition of ‘% (x,), for po € K (xy). Further, 
f, (x (A)) = 0, i€ J°, since r* (A) satisfy (4.6) by definition. Fur- 


ther fy (x (A)) < fy (xy) with small 4 > 0. In fact by Taylor’s for- 
mula 


fo (2 (A)) = fo (te) + (fo (E), % (A) — Ze) 


where € is a point of the segment joining z,, and z (A). Therefore 


(1 P y tA) pg, 
foe ON lo Fe) _ (6), po) + Oe: 
wn 
Since from (4.5) (f((z,), Po) <0 and mi). 0 we obtain, with 
small positive A, ——>2z, and 


foe (M)—fo les) — 9 


Similarly if i € J- and f; (z,) = 0, then by (4.5) 
fi(z(A)) <0, i €F-, fi; (zy) = 0. 


If fi (zt) <0, i€ J-, then f; (x (A)) <0, by continuity. 

Thus point z (A) with small positive A satisfies all constraints (4. 1) 
and f, (4 (A)) < fo (x,). But this contradicts the fact that z, is a 
point of local minimum. 
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The contradiction obtained shows that sets Z and P do not inter- 
sect. Since these sets are convex, they can be separated. This means 
that there are numbers wv’, u' € J- U J, not all of them zero, such 
that 


w+ DS we>uvw+ uw, 2€Z, weP. (4.13) 

ie “US” ieJ UT” 
The structure of sets Z and P makes it possible to draw certain con- 
clusions about numbers u’*. By the definition of P, w® can take any 
value less than zero. Hence w® > QO, otherwise the right-hand side 
could take any great value and this contradicts (4.13). Similarly, 


u'>0 if f; (zz) = 0, i€ J. (4.14) 


Further, if i € J- and f; (z,) < 0, then w* is arbitrary. Therefore 
for the inequality (4.13) to be valid it is necessary that 


ui = 0 if f, (ty) <0, i € F-. (4.15) 


Letting now w in (4.13) tend to zero so that w € P and taking into 
account (4.15) and the definition of Z, we obtain 


u(f,(z,), P)t+ Dd wu (filz,), P)0, peK (z,) 
ice J US? 
or 


(wf, (z,)+ > wfi(x,), p)>0, peK(z,). (4.16) 
ice J-US® 


The statements proved (4.14), (4.15), (4.16) are obviously equivalent 
to the above theorem. This completes the proof. 

Corollary 4.1. If X = E”, then for point x, to provide a local mini- 
mum it is necessary that there be numbers u‘, not all zero, such that 


uf, (x,) + > u' fi (x,) =(Q, 
icJ-UT® 
w>0, wd, i€J-, u'fi(z,)=—0, i€F-. (4.17) 


Proof. lf X = E”, then any direction p is admissible, i.e. K (z,) = 
— E”, Therefore cone K* (z,) consists of one and only one zero vector 
and relations (4.4) directly take the form (4.17). 
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Corollary 4.2. For point x, to provide the minimum of fy (x) in the 
domain 


a >0, jE# 


where ¥ is a subset of the set j = 1,2, ..., nm it is necessary to satisfy 
the conditions 


Ofo (Zs) >0 if xi =0, J€¥F, 
Ox) 


Sux) 0 if2i>0, jE¥ or je¥. (4.18) 


O24 


Proof. Constraints 2x’ >0, j C Y can be rewritten in the form 
(—aj, 7) <9, jEF, where a; is a vector whose components are 
at = 6;;,, t= 1, ..., n. Using the preceding corollary we obtain 


that there are numbers w° and wu’, j € ¥, not all zero such that 
uf (x,)—  wa;=0, uw, wid, w2?=0, jE#. (4.19) 
Tt 


The first equality can be written in terms of components in the 
following form: 


0 fo ee =) 28; i= 0 
Wt” 
or 


hy Of (Z%) i ic ¥, 


agit CC 


yo ol) 0, i€%. (4.20) 


Oxi 


It follows from (4.20) that u° > 0, since if u° = 0 then all u' = 0, 
pt is contradicts corollary (4.1). Therefore we can assume that 

ih follows directly from (4.20) and (4.19) that corollary 4.2 is 
valid. 

Definition 4.6. The point zx, providing the minimum of f, (x) with 
constraints (4.1), where X = E”, is called a regular point if gradients 
fi(ty) for indices i such that i€ J- U J°, f; (xy) = 0 are linearly 
independent. 

, corollary 4.3. If x, isa regular point, then in (4. 17) we can take 
u° = 1 and the multipliers u', i € J” U 7° are unique. 

Proof. In fact, u° > 0. Since if “ = 0 then by (4.17) the gradients 
fi (v_) for which i €J-U J°, fi (tz) = 0 would prove linearly 
dependent. Further, by (4.17) u' = 0 if f; (z,) < 0. Therefore the 
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first of relations (4.17) with u® = 1 yields the expansion 
fo(t)=— 2 u'fi (a) 


iceJ-US® 
f :(X)=0 


of vector f/, (x,) in linearly independent vectors f; (z,) and defines 
uniquely wu‘. 
Let there be now only equality constraints in problem (4.1) 


fi(xt)=0, it€7 


and X = £&”. If with such constraints z, provides the minimum of 
fo (x) and gradients f; (z,) are linearly independent, then the neces- 
sary conditions for a minimum (4.17) can be written in the form 


fo (z,) + 4, u' fi (z,) = (0. 


The set of vectors p such that 
(fi (4), P) = 9, tES 


in the case under consideration is called a tangent (bounding) mani- 
fold at point z, to the set 


D = {z: f(z) =0, i€ J}. 


Corollary 4.4. For point x, at which f; (ty), i€ J° are linearly 
independent to provide the minimum of the function f, (x) in set D 
it is necessary that gradient fo (xy) be orthogonal to a manifold tangent 
to D at point x4, i.e. if p belongs to the bounding manifold then (f, (x4), 
p) = 0. In other words the projection of vector f, (xy) on the tangent 
manifold vanishes. 

Proof. lf x, with the above assumptions provides the minimum 
then 


(fo (2y)> P) = — >) u' (fi (z,), p) =0 
ie J 
for any vector p of the tangent manifold. Conversely if (f, (z%), p) 


is equal to zero with any p, which belongs to the bounding manifold, 
then we can write 


fo (z,) — > u' fi (z,) 
icJ° 


This follows from lemma 1.6 if each of the equalities (fj (z,), p) = 0 
is written down in the form of two inequalities: 


(fi (Tx), Pp) = Q, —(fi (xy), P) = 0. 
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Minimax Problem 


It is required to find the minimum point of the function 
f(x) = max fy (2) (4.21) 


where f/f; (x) are continuously differentiable functions, z € #”. In 
order to apply the results obtained in the preceding subsection let 
us reduce the problem of minimization of f (x) to the equivalent 
problem of mathematical programming. It is easily seen that if we 
introduce a supplementary variable z"*', then x,—the point of mini- 
mum of f (z)—will also be the solution of the following problem: 
to find the minimum of gy (x, z"*!) = z"*! with the constraints 


gi (xz, z°*!) =f, (x7) —2"7* <0, i=i1,..., m. (4.22) 


The minimum value of gp (x, x"*t") is z™*! = f (x4). 

Let us apply corollary 4.1 of theorem 4.1 to problem (4.22). It is 
necessary to take into account that the problem will now be solved 
in space E"*! of variables z!, ..., zx", z"*! so that the gradients 
of the functions g; (z, z"*') have the form 


gy (x, arti) — () » b=1,...,m, gy(z, ab!) = (1) 


By corollary 4.1 we now have: there are numbers uv®, u', i=1,... 
., m not all zero such that 


wo (1) + ut (*EP) =o, 
u’>0, i=0, 1,...,m, 
u¥ (fi (tq) — 2944) =u (fi: (24) —F (2,)) =0,  i=4, ...,m. (4.23) 


The first of relations (4.22) shows that u® = de u'. Hence, since u' > QO, 


we have u° > 0 for with uv® = 0 all u* would also be zero. Since eXx- 
pression (4.22) is homogeneous with respect to u* we can take u® = 1. 
Thus we have finally obtained the following result. 
Theorem 4.2. For point x, to provide the minimum of f (x) defined 
by relation (4.21) it is necessary that there be numbers u‘,i = 1, ..., m 
such that 


> u' fi (x,) — 0, 
i=1 


™m 
V1 1 ; . 
>yu=1, u'>0, i=1,...,m, 


u! (f; (te) —f (vz) = 0, i=i1,..., m. (4.24) 
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Necessary Conditions 
of the Second Order 


Let us again return to the problem of the minimization of f, (z) 
with constraints (4.1), X = E”. We use the notation ZL (z, u) as 
follows: 


L(z, u)=fo(z) + Dy u'fi (2). (4.20) 
icT UT 


Assume that point z,—the solution of this minimization problem— 
is regular (definition 4.6). Then using corollary 4.3 of theorem 4.1 
we can write the first of relations (4.17) in the form 


Li. (x_, u) = 0. (4.26) 


Assume now that all functions f; (x) are twice continuously dif- 
ferentiable, i.e. that there are continuous matrices of second deriva- 
tives fj (x). Then the matrix of second derivatives L,, (x,, wu) of 
function L (z, uw) with respect to z is also defined. 

The assumption that x, is a regular point implies that the rela- 
tion (4.17) is uniquely determined by multipliers u’,i€ J- U J°. 
We introduce the following notations 


Io (z,)= {i: ub >0, i€ T7}, 
I~ (x,) = {i: fi (z,) =9, 1 T}. 
Using (4.17) we have Jj (z,) — J (z,). Let vector p satisfy 
the relations 
(fi (Z_), P)XO, t€ T(z), tE FO (2,), 
(fi(t), P)=O9, tE FO (z,) US. (4.27) 
Assume 


I p (Ly) = {iE TF (X4) UT: (fi (Ze), P) =O}. (4.28) 


Point z, being regular, vectors f; (x4), i € Sp (zy) are linearly 
independent. Therefore it can be demonstrated that there is a func- 
tion r (A) € £” such that 


fi (x(a) = 9, t€ Sp (x) (4.29) 
where x (A) = 24+ Ap +r (A), lim r()) _ 0. The proof is quite 
+0 


similar to that of theorem 4.1. 
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Further, if i € Zp (zy), then either f; (z,) <0 or (f; (z,), p) <0 
which ensures the inequality f; (x (A)) <0 with small 4. Thus point 
x (A) with small A satisfies all the constraints (4.1), X = E”. Using 
this fact as well as (4.27)-(4.29) we obtain 


fo (x (A)) = L (x (A), u) 


since if u' =< 0 then from (4.29) f; (z (A4)) = 0. At the same time 
we obtain from (4.17) that fy (7,) = DL (zy, u). Taking into account 
that x (A) satisfies all of the relations (4.1) and that z, is the mini- 
mum point of f, (z) with constraints (4.1) we obtain for small A: 


L (x (A), u) = L (ry, u). 


Expanding JL (x (A), uw) to second-order terms in powers of A 
we obtain 


L(x (A), u)= L (ay, u) + (Li (ay, u), (A) —2,) 
+ (Lie E (A), u) (x (A)—2,), 2 (A) —2,) SL (2,, u) 


where —(A) is a point in the segment which joins z, and 2z(A) 
so that §(A) a2, as 4-0. Using (4.26) we obtain 


5M (Liz (a), u) (ptt), pti®)>0. 


Dividing by 4? and taking A —O we finally obtain 
(Lis (Tz, u) DP, P) = Q. 


The following theorem is proved. 

Theorem 4.3. Let functions f; (x) be twice continuously differentiable 
and x, bea regular point of minimum of fo (x) with constraints (4.1), 
X = E”. Then there are numbers u', i€ J~ Uy J° such that 


Ly. (ry, u) = 0, u' > 0, ies, u's; (zy) = 0, i€ J- 
and 
(Lin (Ty, U) P, p) 2 O 


for all p which satisfy inequalities (4.27). 


0. SOME ADDITIONAL INFORMATION 


The Newton-Leibnitz formula which establishes the connection 
between a scalar function f (x) and its derivative is treated in mathe- 
matical analysis. This formula is generalized and applies to opera- 
tor functions. 
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If F (x) is a differentiable operator function defined in an open convex 
set Q2 € E” and x, x +hE€Q, then 


1 
F (c-++h) —F (2) = \ F’ (x + ah) hda. (5.4) 
0 


The proof of the formula (which is valid also for operators, defined 
in functional spaces) can be found, for instance, in the book by 
A. N. Kolmogorov and S. V. Fomin. 

Let us state one more property of operator functions. 

If F (x) is a nonlinear differentiable operator function, then for 
any x, h, y € E” the following formula is valid: 


(F (zc + h) — F (2), y) = (F’ (& + OA)A, 9), 
0<0<1. (9.2) 


This formula is called Lagrange’s formula for operators (or La- 
erange’s generalized formula). Its proof (for operators of a more 
general form) can be found in M. M. Vainberg’s monograph [1]. 

In the following chapters we shall have many occasions of using 
Taylor’s formula with the remainder term in Lagrange’s form. 

If f (z) is a twice continuously differentiable function in a convex 
set Q, then for any z, 7 + hE Q and a € (0, 1] 


f (x + ah) — f (x) = a (f (x + @0,h), h) 


and 
f (wah) = f (x) +a (f' (x), h) + (f" (xt a0gh) h, h) 
where 6,, 0, € [0, 1]. 
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METHODS OF UNCONSTRAINED 
FUNCTION MINIMIZATION 


This chapter is devoted to the problem of minimization of the 
function f(z) defined in an n-dimensional Euclidean space £”. Accord- 
ingly, in this chapter x is always an n-dimensional vector. 

In solving the problem we shall use iterative processes of the type 


Cpt, = Ip + OpPp (0.1) 


where p, is a vector determining the direction of motion from point z, 
and a, is a numerical factor whose value determines the length of 
the step in the direction of px. 

The process (0.1) will be defined if the methods of constructing 
vector p, and computing the value of a, are given for every itera- 
tion. The properties of the process—the values of the function for 
different elements of the sequence {z,}, convergence of the sequence 
to the solution, the rate of convergence, etc.—depend directly on the 
method chosen. At the same time various methods of constructing 
vector p, and determining a, require different amounts of calcu- 
lations and involve different constraints on the function to be mini- 
mized. 

Let us state the considerations on which we shall base our choice 
of the direction of motion and the step length. 

In order to get nearer to point z, (in the general case z,, is the point 
at which the necessary conditions for an extremum of function f (z) 
are satisfied, possibly within a certain accuracy), one should natu- 
rally move from point x; in the direction in which the function de- 
creases, i.e. in the direction of descent. If point z, is not the point 
of minimum or a stationary point, then there is an infinite number 
of vectors p which determine the direction of descent from point x, 
and each vector is defined by 


(f' (Zp), Pp) < 0 
(f (x) is differentiable). 
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This is seen from the following argument. 

Let x = x, + ap. Expansion of the function in Taylor’s series 
about z, (it is obviously assumed that the function is differentiable 
and has a sufficient number of derivatives) gives 


f(x) =f (an) +a (fi, P) +S (fie, P)- 


We set here fh — f(r), The — f" (Tre); Lae = Lp + 0 (x — rp), 
6€[0, 1]. For the sake of brevity, these notations will often be 
used further on in this chapter. 

If (fz, p) <0 then at least with small values of a, f (x) <f (z;) 
since the sign of the right-hand side is determined by a term which 
is linear with respect to a. 

By applying various methods in choosing the direction of the de- 
scent and factor a,, we can obtain different minimization algorithms. 


1. GRADIENT METHODS 
Method of Steepest Descent 


The simplest approach to the choice of the direction of p, in order 
to satisfy the condition (fz, p,) <0 (i.e. of the direction of the 
descent of f (z)) is to assume p, = —/fh. 

The iterative process 


Upt+y = Le — apf: (tp), Lp — 0), k= Q, 1, o 8 (1.1) 


which results from such a choice of the direction of motion is called 
the method of steepest descent or gradient method. 

In terms of coordinates, process (1.1) is written down in the fol- 
lowing form: 
(tr) 54 0 


z i 
Lki4{ = Lp— Op ; oo, 1. 


Ox 


At present the method of steepest descent is one of the best known 
minimization methods. 

The popularity of the method has been favoured by its being 
comparatively simple and suitable for application to the minimi- 
zation of a very broad class of functions. 

We turn now to the study of the properties of the algorithm (1.1). 
First of all we describe the method of choosing the magnitude of the 
scalar factor a,. 

(1) Take an arbitrary value of a (the same at all iterations) and 
determine point x = x, — apy. 


(2) Compute f (x) = f (rz, — afr). 
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(3) Verify the inequality 
f (x) — f (an) < ec (fis Pa) (1.2) 
where 0 < ¢ <1 is an arbitrarily chosen constant (the same with 
any k = QO, 1, 

(4) If inequality (1. 2) is satisfied, then the value of @ is taken to be 
the sought one: a, = a. However if the inequality is not satisfied, 
we reduce @ (multiplying a by an arbitrary 4 < 1) until inequality 
(1.2) is satisfied. 

The above method of choosing a, needs substantiation; the con- 
ditions of the existence of nonzero values of a which satisfy inequality 
(1.2) must be established. Such a substantiation is given in the fol- 
lowing theorem. 

Theorem 1.1. /f function f (x) is bounded from below, its gradient 
f’ (x) satisfies Lipschitz’ condition 


If @—-FfFYWIsRilz—yll (1.3) 


with any x, y € E” and the choice of the value of «;, is made as described 
above, then in the process (1.1) || f, || ~O0O as k + oo whatever the 
initial point Zp. 

Proof. By the mean-value theorem we have 


f (x) — f (tn) = (fF (Zrc), Z — Xx) (1.4) 


where Xp, = t, + 0 (x — zz), 9 E€ LO, 1]. We note that in what fol- 
lows the index kc (c) will be used to denote an intermediate point in 
a corresponding segment. 

Equality (1.4) can be transformed as follows 


f (zx) — tr — (fh Lt — Zp) + (fhe — ths Lt — Lp). 
Hence, noticing that x — x, = — af, and using (1.3) we obtain 


f (x) — fr < —@ (fas fr) + OR || Cre — Zp II I Se II 
<—a || fe [PP + ak |l x — zp |I Il fp l= |lf, IP (—1 + aR). 
The estimate obtained shows that there are values a ~ O such that 
the inequality (1.2) is satisfied; to obtain this result it suffices to 


choose a such that —1 + aR < — kz. This is always feasible, since R 
is a limited quantity and 0 =< e < 1. Consequently, (1.2) will always 


be satisfied with a < > . [hus, choosing a, in accordance with 
the above algorithm we obtain 
frti — fh < —eae Il fr IF, (1.5) 


i.e. with any & we have f,., — f, < 0 (provided || f; || 4 0). Since 
by hypothesis the function has a lower bound, the last inequality 
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gives as k oo 


frti — fr 0. (1.6) 
It follows from (1.5) that 
fi P< (1.7) 


We note that this algorithm for choosing a, ensures that a, > 
= a >O0, with any k, where @ can be any constant which does not 


exceed the quantity _* , since as it was mentioned, the inequality 


remark, 


conditions (1.6) and (1.7) imply that || f, || ~0O ask — oo, and this 
proves the theorem. 

The class of functions satisfying the requirements of theorem 1.1 
is very broad. Such functions can have no minimum point at all, 
can have local minima, saddle points, etc. Theorem 1.1 shows that 
the gradient method provides for convergence either to the exact lower 
bound inf f (x) or to a value of the function at a certain stationary 


point. The convergence of the sequence {z,} to a stationary point 
(if such a point exists) also takes place. It is difficult however to 
determine the rate of convergence with the conditions of theorem 1.1. 
If the requirements concerning the smoothness and convexity of 
the function are sufficiently strict, then not only can the convergence 
of the sequence {z,} be proved but the rate of convergence can also 
be estimated. 

Theorem 1.2. Let f (x) be a twice continuously differentiable function, 
and the matrix of second derivatives satisfies the conditions 


mily|iF<('@Myy<xMilyl?P, Moam>0 = (1.8) 


with any z, y € E”, and the sequence {x,} be constructed by method (1.1), 
where a is chosen in the way described above. Then with any initial 
point xy we have rp >Xy, f (Xp) > f (xy), where ry is the (unique) 
point of minimum of f (2). 
Then the following estimates of the rate of convergence hold: 
fr—fe S q" fo — fal, (Ll Zn — ey | < Cq*/?, 
Cx—w,0<qe<t. (1.9) 
Proof. The existence of the unique minimizer of f (x) with the 
conditions of the theorem follows from the results of lemma 3.2 
(Ch. I). Therefore we have only to prove the convergence of sequence 


{z,} to point z, and to obtain estimates (1.9). Let us first establish 
that the first of estimates (1.9) holds. Using Taylor’s formula we 
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obtain 
f(t.) =f ()+(" (@), t—2) + (f" (#0) (2, —2), 4 —2). 

Hence, applying (1.8) we have 

f(z) —f («,)<(f' (2), z—-2,)-—F || z—2, |P 

<I 7° (z) [il e—2,||-— + || z—-2, IP. (1.10) 

At the same time (since f’ (z,) =0) 

f(x) —f (2) = +-(f" (ey) (2@—2,), 2—2,) 
and therefore, from (1.8), we obtain 

File—zIP<f(@)—f@)<flle—z,|P(4.14) 


We have from the left-hand side of inequality (1.11) and 
from (41.10) 


|Jz—2, |< Ee (1.12) 
and from the right-hand side of inequality (1.11) 
2 
|e — 2, |’ >, Uf (zt) —f (z,)). 
With the estimates obtained we can write (1.10) in the form 


f(x) —f(,)<HL@ _ iF (2) —F (z,)]. 


Hence 

Lf’ (x) |P>m (14-7) If (@) —F (@,)1- (1.13) 
Applying this estimate we can obtain from inequality (4.5) 

fass— fae < —eanm (1+-47) (ff). (1.44) 


With the conditions of the theorem we have 
, 1 os 
f (x) —f (tr) = (fr, T— Zn) +> (fhe (U — Tn), T—Zy) 
= al flP+$ (fiehis fi< —o (1-) gir. 


It follows that the inequality . (1.2) is’ always satisfied 
if 1-2" Se, ie. if a<a="“—", Then we have from (1.14) 


fuss —fa<[1—eanm (147) ] (fh) <a (fafa) 
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where q=1—eam(1+ 7) <i,i 


(fr—fe) XO" (fo— f"*). (1.15) 
Since a — 20) we have 
_ 2e (1—e) m m 
q=1—=S* (1447). 


Hence the minimum value of the ratio of the progression g,in 


is attained with e = and then 


9min = 1 — sa (1+47). 


Consequently it is expedient to take ¢ = + in the condition (1.2). 


Estimate (1.15) together with the left-hand side one of estimates 
(1.11) makes it possible to establish the convergence and estimate 
the rate of convergence of the sequence {z;,} to the point of minimum: 


en —2y I< (2)? hf) < (>) oF)? he <Cgt. 


The theorem is proved. 

Analyzing the above proof we see that in order to obtain estimate 
(1.15) we used in the end only conditions (1.2) and (1.13). We con- 
clude that the class of functions for which estimate (1.15) holds is 
actually much broader than the class of functions which satisfy 
conditions (1.8), viz. estimate (1.15) is valid for all functions which 
satisfy the conditions of theorem 1.1 and moreover the condition 


IP (x) IP & SO Tf @) — fal, 5 > 0. 


The proof of the validity of estimate (1.15) in this case is not really 
connected with the existence of a minimum; one can suppose that 
fy = int f (x) without trying to establish whether the precise lower 
bound is attained. It should be stressed however that functions of 
this class do have a minimum—not necessarily the only one—and 
that sequence {z,} converges to a certain point z,, the second of 
estimate (1.9) holding true for the rate of convergence. 

Indeed from (1.1) and (1.7) we have 


|| Paes — tn ||? = fll Fh P< — (fa — fas) 


—* (fe — fe) So [fo — fel SCag? 


where Gmax is the maximum value of the parameter at which we 
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start to reduce a. Taking this into account we obtain for any m >k 


m—1 m— { h/ 
1/2 1/2 9g 
l| 2m — Zr [|S 2 I| Finn — 2: || Ci 2 TSO Te 


Hence || z,, — zz || ~ 0 with k +o, i.e. sequence {z,} converges 
(to a certain minimum z,,), and also 


h/2 
\| z,,— 2p || = lim a | Lm — Tp I<" a8 == Cyghl?, 


Variants of the Method 


The method of choosing a, in process (1.1) which involves the 
checking of inequality (1.2) as described above is not the only one 
possible. We shall now consider several other methods of choosing 
the value of a,; each of these methods determines a different variant 
of the gradient method. In proving theorems 1.1 and 1.2 it was estab- 
lished that inequality (1.2) is always satisfied with values of a < 


< (theorem 1.1) orn a< 2) (theorem 1.2). This circum- 


stance made it possible to prove the statements about the properties 
of method (1.1) in choosing a, under the condition of satisfying in- 
equality (4.2). If constants R or M are known which characterize 
the function f (z) being minimized, then in applying method (1.1) 


we can beforehand choose a, =a, where 0O<a< — or O< 


<a < an e) , and theorems 1.1 and 1.2 will remain valid. This 


variant of the gradient method allows to determine more exactly 
the value of the ratio q in the estimates of the rate of convergence (1.9). 
Theorem 1.3. Jf function f (x) satisfies the conditions of theorem 1.2 


and in method (1.1)a,=a,0 <a << , then for the rate of conver- 


gence of sequence {xz,} the estimate 


Il ty — Ze ll <q" ||] zo — Ze II, 
q=max{|1—am|, |1—aM |} 


is valid, the minimum value, Omin = pe is attained with 


« ~ M+m ° 
Proof. We have 


| 2raa— 2, ||? = (Gx —Of,—2Zy, Tr44—7,) 


== (@_p— 1, —O (fa—fy), Th+1— Ty). 
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Applying Lagrange’s formula for operators (5.2) (Chap. I) we obtain 
(a (fr— fe) Ch44— X,) — (Of ke (Z;, — Zy)s Lh+4 —Z,)- 
Using this result we have 
| Th+i— Ly I? = ((f — afte) (Xp, —Z,), Lh+4 —Z,) 
< || 1— fic |I |] Ze — Fy || || Tr+1 — Ze ll, 
1.€. 
| rti— 2, I< I — afne | | Ln — Ly | =q | Lp— Lt, l- 


By conditions (1.8) 
q=||1—af;. || =max {]1—am|, |1—aM }}. 


In the interval 0, az | the linear function 1—aM, as we know, 
changes its sign. Therefore the minimum value gyi, (a@) will be 
attained with 1—am= —(1—aM), i.e. with a=, and ob- 


10 s] _M—m 


The theorem is proved. 
Note that with a= 
defined more exactly 


fife <(Fpm) (fn —f)- (1.16) 


We describe another method of choosing the step length. One can 
select the value of a, providing the minimum of the function in 
the direction of descent, i.e. the chosen value of a, must satisfy 


the condition 


2 . 
Mam the first of estimates (1.9) can be 


f (ty — Onfn) = min f (t_ — afi). (1.17) 

With this method of choosing the step length all the above results 
concerning the properties of method (1.1) remain valid; moreover 
we obtain more exact bounds on the rate of convergence. 

We prove a statement similar to that of theorem 1.1. 

Theorem 1.4. Jf function f (x) satisfies the requirements of theorem 1.1 
and in using method (1.1) a, is chosen by (1.17), then || fz || ~0 as 
k —» oo whatever the initial point Zp. 

Proof. As in theorem 1.1 we obtain the estimate 


f(x) —f (zr) = —O@|| fe |? —& (Fre— fer Fr) 
< — ot fi |? +022 Il fa |P. 
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- The minimum of function 9 (a)= —a|lf,|l?-+@7R|| fi ||? is at- 
tained with a yin = a and @ (@min) = — iit . Since aR || fz ||? is the 
upper bound of the term —a(fize—fz, fr), the value of a, which 
satisfies condition (1.17) is obviously not less than a,j, and 


» (2 
frst — fre (1.18) 


Hence by the same argument as in theorem 1.1 we find that || fs || > 
—>Q0. Q.E.D. 

The estimates (1.9) for the variant of the gradient method where 
the step length is chosen according to (1.17) can be verified in a way 
analogous to that used in theorem 1.2 with the only difference that 
expression (1.13) should be used in the estimate fri, — fe < 


<'— 577 Il fr |? obtained by the same argument as for (1.18). How- 


ever we proceed using the results of theorem 1.3. This ensures higher 
accuracy for the value of the ratio q. 


2 
M-+m 


f (trv1) —f (ta) < (Fa) (fr — Te) 


Set Ch+4— FR 


fr; then estimate (1.16) 


holds. 
If point z,4, is chosen by applying the condition for function 
minimization in the direction of descent, then 


f (Trai) —f (2%) SS (Tn+1) —f(t%,)< (7) : (fr— fe) 
<(Fae yn (fo fe) 


Using now estimates (1.11) we obtain 


2 2 (M—m\2h+!1) M 
tats — 24 P< (fash SS (Fes) = || to — 2, ||P 
and finally 
M—m\Fk+t+i 
| LRiy— Ly Ix<c (GZ) 
where 


C= (SY mo —24 I 


Thus the following theorem is valid. 
Theorem 1.5. If function f (x) satisfies the conditions of tneorem 1.2 
and in applying method (1.1) a, is chosen according to condition (1.17), 
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then sequence {x,} converges to the minimum at the rate of a geometric 
progression whose ratio q =77" . 

Note that the variant of method (1.1) where the step length is 
chosen according to the condition of function minimization in the 
direction of descent is often called in literature the method of steepest 
descent. 


Other Gradient Methods 


Let F (x) be an arbitrary symmetrical matrix which satisfies the 
conditions 


ellylF<(Fiayy<Aily lh, p>O (1.19) 


with any z, y € E”. If we choose vector p = —F (z) f' (z), then 
(f’ (x), p) = —(f', Ff’) <—e lIlf |’ <0 provided || f° (x) || #0. 
Thus vector p = —F (zx) f’ (x) determines the direction of the de- 
crease of function f (x). Then in order to minimize / (x) we can con- 
struct the iterative process 


Thy = Tp — Oph, f (tr)y GO, >O, K=O, 1,... 


where {Ff} is a sequence of arbitrary matrices which satisfy con- 
ditions (1.19). To connect this process with the descriptions of the 
following part of this chapter (to be more precise to make consistent 
the notations used), we shall consider the process 


Tpt+y = LR ap ath, ap => 0 (1.20) 


in which the matrix is the reciprocal of matrix F,. This affects the 
heart of the matter in nothing for if matrix F, satisfies conditions 
(1.19), then for matrix F;' the conditions (cf. lemma 2.9 of Ch. I) 


{ 
millylP<Fy, y)<MillylP, mize >0, My=— (1.24) 


will be satisfied and therefore 

(fro Pr) =—(fa» Fa*fa)< — my || fr IPO. (1.22) 
Different iterative processes will correspond to different sequences 
{Pin }. 

As far as the principles of the methods are concerned the study 
of method (1.20) does not involve any new elements as compared to 
the “pure” gradient method (1.1). All the results obtained for method 
(1.4) remain valid also for method (1.20) with the same requirements 
to the function being minimized and the same methods of choosing 
the step length. Only the technique of proving the corresponding 
statements is lightly changed. Of course, the quantitative values 
of the parameters in (1.20) will differ from the values of analogous 
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parameters in method (1.1). In particular, this relates to the value 
of the ratio g in the estimates of the rate of convergence. 

We shall dwell now only on the results of method (1.20) which 
will be made use of later on. 

Theorem 1.6. The results of theorem 1.2 remain valid for method (1.20). 


Proof. If x=x,-+-ap,, where pp = —Fx'f, then 
2 
f(t) —f (tn) =@ (fhy Pr) +S (fhePa» Pr) 


, M 2 
<a (fis Pa) (14+¢S 7). 


Now using (1.19) we have 


(fey Pr) = —(P RP, Pr) < —O |I Da Il?- (1.23) 
Consequently 


f (2)—f (t2) <a (fi pr) (1--$—). 


Hence inequality (1.2) will be satisfied without fail if 1— 
—- F326 le. if a<a=—=U—* 9, This substantiates the method 


of choosing @,. 
Since (fx, Pp) <9 with || f, ||] 4 0, it follows from the condition 


Sati — Tr < €&p (fh, Dr) (1.24) 


that frais <f,. Using now (1.24) and having in mind that f (z) 
is bounded from below, by analogy to the proof in theorem 1.1 that 
(| fz || ~ 0, we establish that (f;, pz) ~0 as k +oo. By (1.22), 
it means that || f, || —0. Hence, since f (x) is strongly convex, 
sequence (1.20) converges to the solution x,,. In order to obtain bounds 
on the rate of convergence of f, —f,, 2, —- Zy, let us write inequality 
(1.24), using (1.22), in the form f,4, — fr < —ea,m, || fe ||?. Fur- 
ther, introducing in this expression || f, || with the aid of inequality 
(1.13) and applying the argument of theorem 1.2 we establish the 
validity for method (1.20) of the estimates of the convergence rate 
(1.9). The value of the ratio of the progression is 


q=1—eamym (1+) =1-e= SP mm (142). 


ithe minimum of q is attained with _ 


p*m 


Imin = 1 —a pay (1+—7) ; 


The theorem is proved. 
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It follows from the proof that process (1.20) remains convergent 
if we sta, =a,0<a< iw? (a variant of the method with con- 


stant step). By the same argument as in theorem 1.3, one can obtain 
the estimate 


Il Cnty — Te I] << || — OF a fhe II ll tx — 2x |l- 


However it is impossible to obtain an estimate of the ratio of 
the progression as was done in theorem 1.3, since the matrix FR‘ fk 
is not positive definite in the general case (this last property is ful- 
filled only on condition that matrices /;° and f” (z;,) can be trans- 
posed. 

We can consider a variant of method (1.20) in which the step 
length is chosen using the condition that f (z) attains minimum in 
the direction of descent. 

Theorem 1.7. If function f (x) satisfies the conditions of theorem 1.2 
and in applying method (1.20) parameter a, is chosen using condition 


f (Zp + Opp) = min f (zp + &pp), 


then sequence {x,} converges to the point of minimum at the rate of a 
geometric progression. 

The theorem can be proved according to the following scheme. 
Expanding the function into Taylor's series to second-order terms 
about point z, and reasoning as in theorem 1.4 we can obtain the 
estimate: 


1 (f» Pr)? 
froi fas — ZF Mpa: 


This inequality, because of (1.22) and (1.23), is equivalent to 
1 


pm, {| f, {8° 
fii—fr<—t 


Further, expressing || f, || with the use of inequality (1.13) one 
should repeat completely the argument of theorem 1.2. We cannot 
obtain in this case a more precise value of the ratio g since we know 
it must be greater than in the method of steepest descent. 


Qualitative Analysis of the Methods 


Let us compare the gradient methods considered above and con- 
sider certain statements on the quality of these algorithms, i.e. 
on their effectiveness in solving minimization problems. 

We have studied three variants of method (1.1) differing in the 
method of choosing the step length. The properties of the variants 
resemble closely. They can be used in minimizing functions of like 
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classes, their rate of convergence being nearly the same (in the cases 
where it can be estimated). Consequently it is expedient in solving 
problems to use that of the variants of the method which requires 
the minimum of labour. Computational effort at each iteration in the 
variants of process (1.1) can evidently be different only owing to the 
difference in the method of choosing parameter a,. The variant of 
the method which uses a constant step a, = a@ requires the least 
amount cf computations at iterations (for in this case it is necessary 
to calculate only the gradient f’ (x,)). However in most problems 
such a method of choosing a, is practically impossible since usually 
the values of constants R, M characterizing the function are unknown. 

Let us compare the amounts of work required by the methods of 
choosing the step length; this is connected with the checking of con- 
ditions (1.2) and (1.17). We have established that if the function 
f (x) satisfies certain requirements (theorems 1.1, 1.2), then inequal- 
ity (1.2) is always satisfied, at least with sufficiently small values 
of a, (which are determined by the values of constants R, M). By 
virtue of this whatever the value a,,,, from which we began to verify 
inequality (1.2), the inequality will be satisfied, after a certain finite 
number of reductions of the parameter, i.e. we have to calculate the 
function value a finite number of times to choose the required value 
of a,. As to the choice of a, using condition (1.17) it is in the general 
case a procedure with an infinite number of possible values. 

Of course, in practice we have to determine a point of minimum 
in the direction of descent by evaluating the function also a finite 
number of times. Clearly for a more or less precise solution of a one- 
dimensional minimization problem we have to perform more calcu- 
lations of the function value than for satisfying inequality (1.2). 
The above considerations show that one should prefer the method of 
choosing the step length which involves checking of inequality (1.2). 

All of the above remarks can be applied to method (1.20) too. 

The reader understands surely that the above arguments are based 
only on the use of the most general properties of the function being 
minimized and the algorithms being studied and do not make use 
of the specific properties of concrete functions. Therefore the above 
recommendations should not be considered as absolutely applicable. 
This remark should be kept in mind in what follows. 

Weturn now to the discussion of the effectiveness of gradient 
methods. From the point of view of solving problems of function mini- 
mization, for sufficiently good functions (smooth, convex) gradient 
methods give convergence to the minimum at the rate of a geometric 
progression. The value of the ratio of the progression, in particular 
for strongly convex functions, depends on the maximum M™M and 
minimum m of the eigenvalue of the matrix of second derivatives of 
function f (x). The ratio g will be sufficiently small only when the 
eigenvalues m and M are but slightly different, i.e. when the matrix 
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f” (x) is well conditioned. In this case the rate of convergence is fast. 
However in computational practice such problems occur very seldom. 
As a rule we have to find the minimum of functions whose matrix 


f (zx) is ill conditioned ( 7 < 1). The less the ratio a 


to unity will be the ratio g of the progression and the slower the 
rate of convergence. This fact can be given a geometric interpretation. 
With the diminishing of the ratio a 
minimized (i.e. surfaces f (x) = C) becomes more elongated and the 
direction of vector f’ (z) at most points deviates more and more from 
the direction to the point of minimum. This leads to slowing down 
the rate of convergence. This can be particularly well visualized 
by considering, for instance, a strictly convex quadratic function 


the closer 


the graph of the function being 


2 2 
f(x) in space E*, e.g. f = 5(5+4). The matrix of second deri- 
vatives of this function has constant elements, its level surfaces are 
2 2 
ellipses > (= +4) =C, the point of minimum coincides with the 


centre of the ellipses. The eigenvalues of the matrix of second deri- 
vatives are —5 and a . The more the ratio 5 differs from unity, the 
more are the lines of level extended along one of the axes OX or OY 
and the greater is the number of steps in the direction of the anti- 
gradient which have to be taken in moving from an arbitrary point 
(Xo, Yo) in order to attain a sufficiently small neighbourhood of the 
minimum point. 

The slow convergence of gradient methods prevents their being 
used in solving complicated minimization problems since too much 
time is required even with the use of modern high speed computers. 
Therefore at present minimization methods have been and are being 
worked out which have a faster rate of convergence, and gradient 
methods are often used in combination with other more effective 
ones at the initial stage of solving the problem when point z, is ata 
great distance from the minimum and steps along the antigradient 
permit to obtain a significant decrease of the function. We also 
stress once more the unmistakable advantages of gradient methods 
and their suitability for minimizing functions of very different 
characters. 
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2. NEWTON’S METHOD 
WITH STEP ADJUSTMENT 


Construction of the Method 


In gradient methods only the linear term of the expansion of 
the function in Taylor’s series is used in choosing the direction of 
motion, i.e. use is made of the crudest approximation to the function 
being minimized. 

Let function f (z) whose minimum is to be determined be strictly 
convex and sufficiently smooth. 

Consider the function 


p(x) =F (y) +f (y), ty) +5 (f" (y) (e@—y), 2—Y) 


which is a quadratic approximation to f (x) in the neighbourhood 
of a certain point y. Since function f (z) is strictly convex, function 
i (x) as can easily be ascertained is also strictly convex; therefore 
the minimum of this function is attained at a unique point and 


vector p = y — y which minimizes » (xz) is determined from the 
formula p = —(f" (y))7! f’ (y). The direction determined by vector p 
is that of descent of f(z) since (f’ (y), p) = — (f (y) p, p) <9 
by virtue of f (z) being convex. The quadratic function w (z) in the 
neighbourhood of point y, is a far better approximation to the 
function being minimized than a linear function. Therefore one 
naturally expects, at least if point y is in a sufficiently small neigh- 
bourhood of solution z,, that by moving from point y in the direc- 
tion p = — (f" (y)) 7? /’ (y) one can attain a more significant decrease 
of the function and obtain a more accurate approximation to the 
solution than by moving in the direction —f’ (y) which is used 
in the gradient method. On the ground of the above argument we 
suppose that the iterative process 


Tpnt+y — LR — Ap (fr) "fh, Ap > QO, k=0,1,... (2.1) 


when used to construct successive approximations to the solution 
of the problem of minimization of function f (z) will prove more 
effective than the method of steepest descent, i.e. that the rate of 
convergence of x, —Z,, f (z,) —f (z,) when using algorithm (2.1) 
will be faster than when applying the gradient method. The results 
of this section will show that our expectation is justified. 

We shall call method (2.1) Newton’s method with adjustment of 
Steps, or generalized Newton method. 

The usual Newton method corresponds to the case when a, = 1. 

Denoting the elements of matrix (fx)! by 9; (z,), i, j = 
= 1, 2, ..., n, where i is the row index, we can write method (2.1) 
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in its coordinate form: 


oe, i—1, 


ri, =2i—on 3 oy t oe Ml. 
j=1 


Note that method (2.1) can also be presented in the following form: 
frPr = —fhy- Ln+y= n+ OnPh 


or in coordinate form 


> af (tn) ;  __af (zn) 


j 
= Oxt Ax3 Pr Gxt 
Tay =F, ORD; i=1,..., 7. 


Consequently, in order to determine vector p, one can solve a 
a system of linear equations instead of inverting the matrix f” (z,). 

We shall study two variants of the generalized Newton method 
in which different methods of choosing parameter a will be used. 
The first of these methods consists of the following four steps: 

(1) Setting @ = 1 calculate point + = x, + app. 

(2) Evaluate f (x) = f (x, + ap,). 

(3) Check the inequality 


f(t) —f (tn) <e0 (fis Pr) OME <s. (2.2) 


(4) If the inequality is satisfied, then take the value a = 1 to be 
the sought one: a, = 1. Otherwise proceed to reduce @ until in- 
equality (2.2) is satisfied. 

We shall call further on the above method of choosing the value 
of a, method of choosing a, according to condition (2.2). It can 
be seen that this method of choosing the step length is analogous 
to that in the method of steepest descent, involving the checking 
of inequality (1.2). 

The other variant of method (2.1) requires that the value of a, 
provide the minimum of the function in the direction of motion 


f (ta —otn (fa)-* fi) = min f(t — 0 (fi) fi). (2.3) 


— 


Theorems about Properties of the Method 


As follows from formula (2.1) Newton’s method can only be 
applied to the minimization of functions that have an invertible 
matrix of second derivatives and this matrix (f;,)~! as will be shown 
later on must be bounded. 
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Strongly convex twice continuously differentiable functions possess 
such properties. Therefore in this section we always assume the 
function f (zx) to satisfy the following conditions: 


mil yl’? <(f" (x) yy, y) < Ml yl, moO (2.4) 


for any z, y € E”. Recall that such functions have a lower bound 
and a unique minimum point z,. 

Theorem 2.1. [f in minimizing function f (x) that satisfies condi- 
tions (2.4) use is made of method (2.1) and a, is chosen according to 
condition (2.2), then whatever the initial point x, the sequence {x} 
converges to the minimum point at a superlinear rate 


| tw+1— Lel| SG CAy 22 Anse (2.9) 


where, N, C<o, Any, <1 with any 1>O, A; +0 as i > oo. 

Proof. Method (2.1) can be considered as a process of the gradient 
type (1.20), assuming F;,' = (f,)~'. Since matrix f, possesses the 
required properties, the convergence of method (2.1) to the solution 
follows from the general results pertaining to the convergence of 
gradient methods (theorem 1.6). 

Let us establish the validity of estimate (2.5). Note first of all 
that 


(fe, Pr) = — (fePrx Pr) < —™|| Pall’. (2.6) 


Since (fz, Px) <0 and (fk, pp) >~O (theorem 1.6), it follows from 
(2.6) that || p;|| —O as & — oo. Let us show now that from a certain 
iteration on in method (2.1) we have a, = 1. Using Taylor’s formula 
and (2.6) we obtain 


frsi—fn =n (fr Pr)-+—oe (fkPay Pr) +e (fhe — FR) Pas Pr) 


, CLE an | tke—fh |! Il pe Il? 
<p (fk: Pr) (4 in alee) 


where 2p¢ = Xp-+ O(Lp4y—7,), GEO, 1]. Since ||z,—z, || > 0, we 
have as k-» co 


\| fie — Fi [TX fre — Fe I]+ IF — fa ll > 0 
by virtue of the (operator) function f” (x) being continuous. Then 
for any constant O<s < > there is a number WN, (e) such that 


with k > N, (e) the condition 
0 Ok I fhe— ‘ai >e 


2 2 m 


will be satisfied with a, = 1. This means that inequality (2.2) 
will also be satisfied with a, = 1. Thus the method used in choosing 


60 


NEWTON’S METHOD WITH STEP ADJUSTMENT 


the step length under the conditions of the theorem guarantees that 
method (2.1) from a certain iteration on will be applied with a step 
equal to unity, i.e. will be transformed into the usual Newton 
method. We can now obtain bounds on the rate of convergence of 
the method: 


(Tr+1— Tey Thti— T,_) =(La—L_ —( fh) fa» Ta+1—,). 
By Lagrange’s formula for operators we have 
(fa) * fh: Trei—%y) =((fa)* (fk—F)) Trt — Fy) 
=((fi)™ fae (Tk —Xy)s La+1— Ly) 
where 2pac =X, +90 (2, —-2z,), OE[O, 1]. Consequently 
|| fata — 2, U[? = (CZ — (PR) Fee) (Zn —Zy)s Lrt1— Ty) 
=((fk)™* (fk — fhe) (Th — Ze), Lai —T,) 


<< Il fi— fie ll || 2 —e [Il] 241 — Ze | 


or 

| Lr+iy— ty, [|< An | tx» — 2, | (2.7) 
where A, =< ll fe — feell. Since || fz — fae|| ~ 0, there is a num- 
ber V such that withk = N +1,1=0, 1, ... we haveayi; <1 
and Ay+; +0 as l + oo. Setting || 7, — z,|| = C and taking into 


account the above remarks we obtain estimate (2.5). 

The theorem is proved. 

Let us suppose now that matrix f” (xz) satisfies besides condi- 
tions (2.4) also Lipschitz’ condition 


IP (@—PfMmil< Rll cz — yll, ze, y € E*. (2.8) 
In this case we have in estimate (2.7) 
{ ” ” R 
An =— || fa — fhe \|<— ||. — 2, | 
and therefore 
R 
| Trt+y— ty |l<— | th— Ty, ||. (2.9) 


Consequently the following theorem is valid. 

Theorem 2.2. If function f (x) is such that conditions (2.4) and (2.8) 
are satisfied, then sequence (2.1) in which the values of a, are chosen 
according to condition (2.2), whatever the initial point x), converges 
to the solution at a quadratic rate, i.e. estimate (2.9) is valid. 
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Estimate (2.9) can be written also in the following form. 
We set be =— | x,—Z,||. There is a number Z such that with 
k=L-+1, /—0Q, 1, ... we have Ure1< 1 and 


R R 2 R 2l 
|| 2.41 — 24 I< (Mars — 201) <-.-<(—Hller— all). 


Finally we can write 


|Zr41— 7, I<. 


Let us consider now the variant of method (2.1) with the step 
Iength being chosen according to condition (2.3). The convergence 
of sequence {z,} to the solution in this case follows from the general 
results about the convergence of gradient methods (theorem 1.7). 
The rate of convergence as in the case of choosing a, according to 
condition (2.2) will be superlinear if condition (2.4) is satisfied and 
quadratic if condition (2.8) is also satisfied. This can be proved 
as follows. 

Let 2y4, = tp — (fk)~* fe and 2p4, = Fp — Op(fe)* fe where ay 
is chosen according to condition (2.3). Then using estimate (41.11) 
we obtain 


~ M ~ 
I ts —e [P< fata — fae Sf (East) — fe <> | Fatt — Ze | 


By (2.7), l| Za41—2, |< Ax || 2x —2, ||, An —O as k—>oco. Conse- 
quently if conditions (2.4) are satisfied, then 
M \1/2 
|] Zas1— 24 |< (——) An | ox — 2 |] = Vall ta— 2, || (2-40) 


Pe 


2 
e 


where y;, = (—-) tle An —>O as k-> oo. If estimate (2.8) holds, then 


R M \1/2 R 
Ma <2 xg—azy |, em — ae |< () 2-2, |P (2-11) 


m 


Modifications of the Generalized 
Newton Method 


As one of the possible modifications of method (2.1) we shall con- 
sider an algorithm in which the sequence of approximations to the 
solution is constructed by the following formula: 


Trty = Tp — Op (fo) * fa, Spy = 9. (2.12) 
In this method p, = — (f,)~'fr, i-e. in order to determine the di- 
rections of descent use is made of the same matrix (fo)~'. Method 


(2.12) is a particular case of algorithm (1.20) (F n. = (f-)~1). Therefore 
it can be asserted that sequence (2.12), whatever the initial point Zz, 


62 


NEWTON'S METHOD WITH STEP ADJUSTMENT 


will converge to the solution at the rate of a geometric progression 
both if the step length is chosen according to condition (2.2) and 
if it is chosen using condition (2.3) (theorems 1.6 and 1.7). However 
the value of the ratio q, i.e. the actual rate of convergence, will 
significantly depend on the initial approximation chosen, Zp. 

In fact, taking into account that in method (2.12) (fr, Dr) = 
= — (fx, fif,) and using Taylor’s formula we can obtain the follow- 
ing bounds on the rate of convergence (as in theorem 2.1): 


Il fhe — Fo ll 
frti— fran (fas Pr) (1 —-a-—-3 a) (2.13) 


m 


where Zp- = TR + 98 (rn4) — Z,), 9 E[D, 1]. If zy > 2,, then since 
the matrix of second derivatives is continuous we have 


max | f" (2) — Ff’ (zo) | +0 


(S ={zx: f(z) <f (z9)}). Thus the closer is the initial point z, 
to point z,, the greater will be the value of a, which satisfy the 
inequality (2.2), i.e. the greater will be the step in the process (2.12) 
if the step length is chosen according to condition (2.2). In parti- 


cular for any constant O0<se < > there is a constant p (&) such 


that if the initial approximation z, was chosen in a sphere S of 
radius p, we shall have 


{ 


4 max | f” (x) — fo I 
2 a an 


1 
> =e. 


rre 


This means, by (2.13), that provided the initial approximation 
was chosen sufficiently close to point z,, inequality (2.2) will be 
satisfied with a, = 1, i.e. process (2.12) will converge with a step 
equal to unity. Then proceeding as in the proof of theorem 2.1 we 
obtain the following estimate: 


ll Cnty — Tell <M] (OTM A LG — feel Il te — ell 
< q|| z, — 2, || (2.14) 


where q = = max || f, — f” (z)||. This shows that the value of the 
xES 


ratio g depends on the choice of the initial point z), the value 
of g becoming the smaller the closer point zx, lies to the solution z,. 

For the variant of method (2.12) in which the step length is chosen 
under the condition of f (z) attaining minimum in the direction of 
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motion, one can find using inequalities (2.14) and reasoning as in 
obtaining! estimates (2.10) and (2.11) that 


M \i/2 ~ 
Il Za+1— 2s I< (——} q|| zt, —2, || =@|l 2. —2, |I 
1/2 4 
mm 


~ M ~” ~ ; 
where g= (—-) max || fy — fae || O if x) > z,. 
Another possible modification of Newton's method is the follow- 
ing. 
Letk = § +i,§ =0,1,...,i=0,1,..., ¢—1, t2>1 be 
an arbitrary integer. We can construct an iterative process 


Tetpity — Leiti — Aett+i (fer)" fete, Ap 2 O 
or, with the original notations, 
Thy = Zp — Op (fit) fr, on SO. (2.15) 


Such a method takes an intermediate position betwen algorithm 
(2.1) in which for the construction of vector p, a new matrix (f;,)7" 
is used and algorithm (2.12) in which in determining the direction 
of motion the same matrix (f;)7! is always used. In method (2.15) 
a new matrix is applied after ¢steps. This algorithm as well as meth- 
ods (2.1), (2.12) can be considered to be variants of the gradient 
method (1.20); therefore its convergence with different ways of 
choosing the step length follows from theorems 1.6 and 1.7. 

We consider the rate of convergence of method (2.15), assuming 
that the step length is chosen according to condition (2.2) and that 
conditions (2.4) and (2.8) are valid for function f(z). 

Using Taylor’s formula we obtain 

Ah Oh | tao — fiz I 


fass— fa <On (fies Pr) (1-2 — BR) 

Because of the convergence of the process we have || tye — Xe || = 
= || Zettine — Zeell ll Tee — Teetall + ~~ - + |b teete — Zeetiaall > 
—>Q as k — oo, therefore || fre — fr:|| a O. Taking this into account 
and reasoning as we did in theorem 2.1 we can prove that from a 
certain iteration on, method (2.15) will converge with a step equal 
to unity: a, = 1. Then, by theorem 2.2, the following estimate is 
valid 


R 
Il Zer+1 — Ze I<— 21-2, I’ (2.16) 


with any E>L, where LF is a positive number. Further, proceeding 
as in theorem 2.1 we can obtain the estimate 


ll Ter42— 7, |] =| teen —(P 00! fern — 2, Il 
<]|| (fer) I I fee— fertre I | Leety~——X | 
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where Lie t+1)e = Ley, +0 (2, — 2244), BE [O, 1}. We have by (2.8) 


Il fet — Peete Il S | fee — Fl] + WS — feetsre lL R (|| tee — Zell + 
tlltenn 2, sing estimate (2.16) we now obtain 


R 
| Leit+g— Ly I|<— ( | Ley— Ly'|| + || Teta — 72, | ) || Tee — 2, li 


<=F llte—2 I? (4 + \l%—2, l) , 
1.€. 
| LD ti+. — z,|| < CII Le, — z,\l*, C, <0. 
Suppose that with a certain 2 <j <t—1 the following estimate 
is satisfied: 
l] Terty — 2 [|< Cyll Tee — z,\l’*), Cy; <0. 
Then we have 


IIZee+j41 — Ze || =|] Zee; — (fE.)* fee — 2, || 
<l (fee)7 || |] fee — Fete Il ll Pet+y — 2. || 


R 
<i (|| tee — 2, || + |] Zee45 — Ze |] I er43 — 2, || 
° R . 
SCjsa [tee — Te", Crt = 7 CMA + Cyl] Be — 2, IP)- 


Thus the following bound on the rate of convergence is valid 
for method (2.15): 


ll Tetat — Zl] < Cl] zee — z,|I"*?- (2.17) 


This estimate means that sequence {x;;} converges to the solution 
at a rate of the order of ¢ + 1. 


Discussion of the Properties of Newton’s Method 


We have established that Newton’s method with adjustment of 
steps converges to the solution whatever the initial point z, at 
a rate either superlinear or quadratic depending on the requirements 
satisfied by function f (2). 

The convergence of method (2.1) from any initial approximation 
on is its essential advantage over the usual Newton method in 
which the convergence is ensured if the initial approximation is 
sufficiently good (i.e. sufficiently close to the solution of the problem). 
Besides, in applying Newton’s method the check of the conditions 
which guarantee that the initial approximation ensures the con- 
vergence of the process is in practice difficult to perform, since 
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it requires such data about the function that are usually unknown 
(for instance, values of the constants m, M). 

The comparison of the two methods of choosing the step length 
according to condition (2.2) or (2.3) is in favour of the former, for 
it proves less laborious as to the amount of calculations of function 
values (in particular, from a certain iteration on the former method 
requires the function to be evaluated only once since a, = 1) and 
guarantees the rate of convergence not slower than with the latter 
method. 

If we compare Newton’s method and gradient methods as applied 
to solving problems of convex function minimization, it becomes 
evident that Newton’s method ensures a faster rate of convergence 
of the sequence of approximations to the solution. Thus if we consider 
the rate of convergence to mean effectiveness of a method, then our 
supposition stated at the beginning of this section that Newton’s 
method must be far more effective than the gradient methods is 
justified. However a more precise meaning of the concept of effecti- 
veness of a method is based on estimating the amount of computa- 
tion involved when applying a concrete algorithm for the solution 
of a problem to the required accuracy. Consequently the effectiveness 
of an algorithm can be estimated by the number of iterations, which 
are necessary for solving the problem, and the amount of computa- 
tions at each iteration. 

The amount of computations per iteration in Newton’s method 
is as a rule considerably greater than in the gradient methods because 
of the necessary computations and inversions of the matrices of 
second derivatives. On the other hand, Newton’s method usually 
involves scores and hundreds of times less iterations than gradient 
methods; by virtue of this fact Newton’s method proves to be consi- 
derably more effective. 

Nevertheless in many problems the labour per iteration in Newton’s 
method can prove excessively great because of it being necessary 
to calculate matrices of second derivatives, f” (xz) (as a rule in solving 
extremal problems the greatest difficulty is the calculation of the 
matrix f” (x) and not its inversion). Such problems will be considered 
later on. In order to solve the problem in such a case, one can make 
use of one of the modifications of Newton’s method which we have 
studied. In one of the modifications we have to calculate and invert 
the matrix f” (z) only once, in the other this is made after a finite 
number of iterations. If the initial approximation is good enough, 
then the rate of convergence to the solution will be fast. However, 
using modifications of Newton’s method is not a cardinal solution 
of the problem of reducing the amount of work required to solve the 
problem (speaking generally, it can become even greater). Therefore 
we come to the question of the possibility of constructing minimiza- 
tion methods which would be close to Newton’s method as to their 
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rate of convergence and would require considerably less computa- 
tions at every iteration. 

Several such methods have been worked out; they are based on 
different ideas. As a rule they prove more effective than Newton's 
method and this is why they are used more and more at present. 
The next three sections are devoted to the study of such algorithms. 


3. METHODS OF DUAL DIRECTIONS 


Considerations on the Choice 
of Schemes of the Methods 


In the preceding section we noted that the main difficulty in 
applying Newton’s method is the necessity of evaluating the matrix 
of second derivatives of the function being minimized. Consequently 
algorithms which would be more effective than Newton’s method 
should exclude the calculation of second derivatives, providing 
however the rate of convergence of Newton’s method. 

The question arises whether it is possible in constructing the 
sequence of approximations to the solutions to determine directions 
Pr Which would be close to those in Newton’s method, by using for 
this purpose only the first derivative of the function being minimized. 

The first and the second derivatives of f (x) can be related by 
Taylor’s formula for operators (the gradient f’ (x) is one): 


f(y) —f (z) =f @) Y — 2) + © (@, y — 2) (3.1) 


where || (z, y — z)|| = 0 (|ly — 2). 

The equality (3.1) suggests that if we calculate the derivatives 
f' (z) at arbitrary but close points 2, ..., 2,4, and determine 
the square nm X n-matrix A with the aid of a system of (vector) 
equations 

f (tits) — fF (zi) = A (B41 —7;), t=1,..., 7 (3.2) 


(assuming of course vectors 734, — z%;, i= 1, ..., n to be linearly 
independent), then matrix A must be close to the matrix of second 
derivatives calculated at any point 2;. 

In fact, by (3.1) with any i we have 


f (xian) — f (@i) = Ff (23) (Zia — 21) + © (2; , Lity — 2;) 


and therefore using (3.2) we obtain 
A (tit, — 4) = fF (@i) (Tita — Zi) + © (%j, Ti41 — 2j) 
i=1,..., M7. 
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This system of equations can be rewritten in the following form: 
A (ti41 — 2) = fj (titi — 2i) + (fi — fi) (Cita — 2i) 


+@ (aj, 241 — 24), 
i=1,..,n, 1<j<n. (3.3) 


If matrix f” (x) is continuous and nonsingular, then by virtue of the 
assumed closeness of points x; the sum of the two last terms of the 
right-hand side of each equation in system (3.3) must be considerably 
less than the first term, i.e. 


A (ti4, — 2;) & f" (xj) (Ti4n — Zi), T= 1,2... 7 


and this in general means that matrices A and f;, j = 1, ..., n, 
must be close to each other. It can be easily imagined how the above 
considerations can be used for the construction of iterative processes 
of minimization. If {z,} is an arbitrarily constructed sequence which 
converges to the minimum point of f (x), then in a sufficiently small 
neighbourhood of the minimum point points 2x;, r,_1, ---., Zr—p 
are close to one another. Therefore having defined matrix A, by the 
system of equations 


An (Zp-i — TR-i-a) = ff (Te-i) — ff (@e-i-), 2 = O,1,...,n—1 


we can construct the (&k + 1)-th approximation using now the for- 
mula 
Lpty — Lh — apAi fr, Ar => Q. (3.4) 


If matrix A; happens to be sufficiently close to matrix f,, then 
direction p, = —A;'f, will be close to direction —(f,)—! f, (i.e. to 
the direction of motion in Newton’s method) and therefore will be 
the direction of descent. If we continue in a similar way to deter- 
mine matrices Ap,z4,, Apto, ---, by virtue of their being close 
to matrices fz+,, fit -- - process (3.4) must be close as to its prop- 
erties to Newton’s method. At the same time method (3.4) does 
not require the calculation of the second derivatives of the function. 

On the basis of the preceding reasoning it proves possible to con- 
struct a whole class of descent processes with a superlinear rate of 
convergence and whose implementation does not require the calcula- 
tion of second derivatives of the function. We call these processes 
methods of dual directions. The origin of this name will become 
clear later on when we shall discuss methods for calculating matrix 


A; and vector D,. 
We now turn to the strict substantiation of methods of the (3.4) 


type. 
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Substantiation of the Methods 


Suppose that f(z) is a function which has continuous first and 
second derivatives. Given an infinite sequence of elements {zx}. 
We take the sequence {y,} corresponding to {z,} in accordance with 
the formula 


Yr — Zp + Tp (3.9) 


where vectors r, are such that the following two conditions are 
satisfied: 

(1) If A, is a determinant whose columns are vectors | Toa ; 
Te , then with any k >n—t1 we have A | > é, 

—n+ 
€ is an arbitrarily small positive number. 

(2) || r,|| ~ 0 as & — oo. In other respects the choice of vectors rp, 
is arbitrary. 

The first of the requirements that vectors r, must satisfy is, in 
point of fact, the requirement of their linear independence. 

Lemma 3.1. Let {x} be a bounded sequence, where || x,44 — xz||—-0 
as k —- oo and with any k > n — 1 matrix A, be defined by the follow- 
ing system of equations 

A pT p-; = Cp-i; i= QO, 1, oc eg L— 1 (3.6) 


where e,-; = f (Yn-i) —f (Zp~;) and rz, y, are elements of sequence 
(3.5). Then we have 


lim || Ay — f” (@)Il = 0. 


Proof. Using formula (5.1) of Chap. I to give the operator f’ (z) 
in terms of its derivatives, we can write 
1 


f’ (Yp-i) — f' (1,-i) — \ f" (Lp-; + T (Yr — Xp-;)) Pri dt 


0 
1 


i 
= f" (Ln-i) Pre dt + \ [f" (ee-i t+ tre_i)— f" (Ln-i)\ Pei AT 
0 


0 
1 


=f" (2p-i) TR-i + \ Lf (Za-i+ tre_-i) — f” (ni) ] rp_; AT. 
0 
Using this expression we have 


(A, —f" (tn) Pe-i =(f" (Uni) — f” (@R)) Pej 
{ 
+ \ [Lf (@p-i + trp_-i) — f” (Zp-i)] Tai AT. 
0 
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Introducing the notation A, —f,=--8, we obtain 
| Barni || <I] fe-i — Fe [I || Px || 
+ sup || f" (ze-i + tre-i)— f” (Zn-s) |l [| e-i |l- (3.7) 
0<tT<1 


Since {z,} is a bounded sequence, with any k we have zx, € Q, 
Q CE" is a closed bounded set. Function f” (zx) is uniformly conti- 


nuous in set Q; consequently || fr_; — fri] = An-i ~m O and 
beret If" (te-i + Thr-i) — Ff" (p-i) || = Ba-i > 0 as k +0. 


Thus it follows from (3.7) that 
| Barr-ill < (An-i + pr-a)ll Tr-ill = 2n-ill Te-i ll (3.8) 


where h,_; ~0O as k + oo. 
According to the definition of the operator norm, || B;|| = 


= max || B,2\||. Let the maximum be attained at element z,. If 
ll z I|=1 


Z —Aanet 
ok — t Tl +. oo + 8p-nti: | 


Th—-n+1 ll’ 


then because of the condition | A, | => «> (0, the coefficients 65, 3 
will be bounded |6,-;|<.C, i=0O, 1, ..., n— 1. Using the 
expression of vector Zz, we obtain 


\| Br || =|| Beze |] = 


y rr-i 
> ont Be Treal WL rr-i ll <3 | On-1Be Tal 


Hence taking into account (3.8) and the fact that |6,-;| is 
bounded we have 


n—i n—1 
Th 
| Bull < Dy [On| tn — >) | Sn-i | Aa-i > 0 
i=0 1=0 


as k —» oo. The lemma is proved. 
The results of the above lemma open the way to the construction 
of methods of the type (3.4). 

Lem ma 3.2. If f (z) is a continuously differentiable strongly convex 
function and sequence {x,,} is such that fpr, < f, and (fr, aii —Tp) > 0 
as k — ~, then || xp4. — 2,|| 0. 

Proof. According to condition f,4, < f, we have 2,4, € Sp, Sp = 
={zx: f (rz) <f (x,)} with any k. Set S, is strongly convex since f (z) 
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is a strongly convex function (lemma 2.8 of Chap. I). Then there is 


a positive number A>0O such that any point Fhe Th 4g, where 


I El] <A || casa — all? is an internal point of set S;. Let tk = 
—v+q@ where v€T,, 7, is a plane tangent to set S, at point z, 
and wlT,. Then noting that f’(z,) 17, we obtain 


aa (fhy Tati—Te)| =| (fe, v+o)| =|] fall | ll. 


But || || >>|| &|| since otherwise in addition to point x, set S;, and 
plane 7, would have other points in common, which contradicts 
the strong convexity of S,. Therefore 


Leg 
x (fas Tr+1— Zr) ||| fe || |] Zara — Ze ||? 


Hence if || fx|| ~~ O then || 2,4, — 2z,|| ~0O. But if || f,|| +O then, 
since f (x) is strongly convex, the maximum diameter of set S, d,; — 
—>+Q, which implies that || 2,4, — x,|| ~0. The lemma is proved. 

We can now discuss the properties of process (3.4). We shall study 
this process assuming that the value of parameter a; is chosen 
according to condition (2.2) taking into account that in this case 
Pr = —Aj'fr. The function being minimized is assumed to be 
smooth and strongly convex. We motivate the possibility of choosing 
parameter a, in the same way as was done in the preceding sections. 
Expanding the function in Taylor’s series, we obtain the following 
es: imate: 


a A ’ Ghar I Pril* 
frsa—Ia<arn (fis Pr) [4 +S SAPP 4 See APA 


where an =| fee — Arll, Tac = Zn + O (Te+1 — Zp), 8 E [0, 1]. Note 
tnat 


(fro Pr) = — (AnPr» Pr) = — (fa, An’ fa). (3.9) 


Taking (3.9) into account we have 


Op Or2r || Pr |l? 
froi—frSOr (fr, Pr) E a a ee ar ° 


Hence condition (2.2) will be always satisfied if a; satisfies the 
following inequality: 


_ Gh |, Gate |i Pell? 
{—-=-+-5 pa) >>. (3.10) 


Theorem 3.1. If f (x) is a twice continuously differentiable function 
for which conditions (2.4) are valid, matrix A, with any k >n— 1 
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is defined by system (3.6) and satisfies the condition 
(Ai'fas fr) > 0 (3.11) 


and a, is determined according to condition (2.2), then whatever the 
initial point x, the following statements are valid for sequence (3.4): 
Tati <i fn and || x, —2z,|| ~O at a superlinear rate of convergence: 


Il2n+1 — Ty] S Can. ~~ Ayes (3.12) 


where C, N < w,Any, <1 withany l>O,A; ~O0asi +o. 

Proof. In order to make use of the results of lemma 3.1 we must 
first of all show that the conditions of the theorem imply that 
|| Spay — 2|| —O for sequence (3.4). 

According to conditions (3.9) and (3.11) we have (fz, pr) << 0 
with any k. Hence it follows, first, that there is always a value 
a, = 0 such that inequality (3.10) is satisfied (and consequently 
also (2.2)), secondly, by (2.2) we shall have f,4, < f,. This means 
that 2,4, € S ={z: f (x) < f (z )} with any & and besides since f (z) 
has a lower bound, f, — fz4, —90; by virtue of this it follows from 
(2.2) that’ 


Ar (fr: Pr) — (fr Tht. — Lp) +> 0. (3.13) 


Since f,4,<f, and condition (3.13) is fulfilled, sequence {z,} 
satisfies the requirements of lemma 3.2. Hence, as k — co 


mente 


(3.14) 


Thus the*conditions of the theorem provide for the fulfillment of 
all the requirements of lemma 3.1. It follows that 


|| An — frill +0 (3.15) 


as k->oo. Taking into account conditions (2.4), for any M, and m, 
such that M, > M and 0 <— m, <_m there is a number L such that 
with k>UL for any y € E” we have 


ml y|l? < (Any, y) < MyjII y Il. B4E (3.16) 


Because of conditions (3.9) and (3.16) from a certain k& on we shall 
have (fr, Dr) < —m,|| p,{l?. Consequently inequality (3.10) will 
always be satisfied if the following inequality is satisfied: 


Lr Apap 1, “4 y 
1-9 ee. O<ce< > (3.17) 


|| Zeta — Z|] > 0. = 


Noting that a, < M+ M, < oo it is easily ascertained that 


there is a constant a > 0 such that with any & inequality (3.17) 
will be satisfied with a, > a. By virtue of this it follows from (3.14) 


4 v 
“hat || pall = a, | ata — Tall —> 0. 
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Hence 


< M,|| px|| > 9. 


The last condition as shown by inequality (1.12) means that 
x, —>z,. Let us establish that estimate (3.12) holds. 

Since || p;|| ~0O and condition (3.15) is fulfilled and the second 
derivatives of function f(z) are uniformly continuous on set S, 
we have as k oo 


On <I f" (te + 8 (Ta4y — Xn) — ml + | f° (zx) — Agll 0 


and it follows that with any 0 <es <-+ > from a certain iteration 


on inequality (3.17) will be satisfied with a, = 1. This means that 
process (3.4) will be implemented with a step equal to unity. 
The left-hand side in equality of estimates (3.16) means that with 


k > L we shall have || A;}|| < 7 (lemma 2.9 of Chap. I). Together 


with the possibility of inverting matrix A, with any k >n—1 
this estimate makes it possible to conclude that there is a constant M, 
such that || Az?||< M, with k >n— 1. Taking this into account 
we can establish the bounds on the rate of convergence like we did 
in theorem 2.1: 


| ta41 — 2, [|<] L— Ain’ fic || || 2 — 2, || 
S| Ast |] |] An — She [I || 2 — 2 |] M2 || Av — fre |l || Ze — Ze || 


or | Trs1— 2 [I<Ax |lt,—2, || where A, =M,||A,—fic||. Since 
| An — fae I<] An — fall + I fa—2" ( (24 +0 (an—z,)) [> 0, there is 
a number NV such that with k=N-+1, 1=0, 1, ... we shall have 
An¢i<ci and as loo, Ayi, > 0. 


Setting || z, —2z,|| = C and taking into account our remarks 
about the values of hie we obtain estimate (3.12). The theorem is 
proved. 

Condition (3.11) used in the theorem, because of (3.9), means 
that the direction p; is the direction of descent of f (x). It can occur 
that (Az’f:, fr) <O at some iterations of the process. In this 
case we can either change vector r; and construct a new matrix A;' 
(such that condition (3.11) be satisfied) or make a step in the di- 
rection of the antigradient. The number of such steps will always 
be finite since in descending along the antigradient || x,4, — zz|| 0 
and if vectors r; satisfy the requirements formulated above, then 
|| An — frl| +O. Consequently according to conditions (3.16) and 
(3.9) from a certain iteration on we shall have of necessity 
(Anf;, fz) > 0. However if with a certain k& the condition 
(Aj, fh, fe) = — (Pn, fe) < 0 is fulfilled, then it is easier to change 
in formula (3.4) the sign of the scalar factor a,; then the motion from 
point zx, will be in the direction —p,, i.e. in the direction of descent. 
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Construction of Various Algorithms 


The requirements which must be satished by vectors r; used in 
constructing sequence (3.5) are not strict and leave us a great freedom 
of choice of these vectors. This makes it possible to construct differ- 
ent algorithms of type (3.4) since different sequences {r,} will 
define (by (3.6)) different sequences of matrices Ax. 

Let us discuss some possible ways of constructing vectors r;. We 
can take as r, vectors directed along the axes of coordinates. For 
example, if ro = Ajv,, then with k = tn + i, where ¢ is an integer 
and i= 0, 1, ..., n—1, we have r, = A,vU;41; Viz, iS the unit 
vector of the corresponding axis and A, is a numerical factor such 
that A, —0O as k —-co. Such a choice of vector r, guarantees that 
the condition | A; | = & will be satished. In this case in order to 
determine matrix A,, it is necessary to calculate at each iteration 
the derivatives at two points, zx, and y,. The law of the decrease 
of A, may be chosen arbitrarily; computational practice shows how- 
ever that the maximum rate of convergence is obtained with monoto- 


nically] diminishing A,; one can, for instance, assume that A, = - ; 

Another possible method of determining vectors r, is as follows. 
With k > n — 1 we can, instead of (3.5), use directly sequence (3.4), 
i.e. assume r, = 2p4, — tp = —Q,A; fe. In fact, the proof of 
theorem 3.1 shows clearly that if A; is an arbitrary matrix which 
satisfies only condition (3.11) and a, is chosen according to condi- 
tion (2.2), then || 27,4, — 2,|| —-O as k — oo. Consequently, if we 
use sequence (3.4) for constructing vectors r,;, then the requirement 
l|7,|| —O will of necessity be fulfilled and we only need to establish 
| A, |= e. If this condition is not satisfied with a certain k, another 
vector r, must be chosen (using not (3.4) but a new formula). In 
such an algorithm, for determining matrix A, at every iteration 
(where sequence (3.4) provides the fulfillment of the requirements 
to be met by vectors r,) the gradient must be calculated only at one 
point Zp. 

Of course, other methods of constructing vectors r, may be used. 

In the system of equations (3.6) which defines matrix A, only 
one new vector r, and the corresponding vector e,; are used with 
any k; the remaining vectors r,_, . - -, Tr-n+, aDd @p-3, - ++ Cr—-n41 
are constructed from preceding iterations. The system (3.6) can be 
modified so that at each iteration of process (3.4) an arbitrary number 
of vectors rp-i,, - - +) TR-i ; 1<j<n (and of their corresponding 


vectors @€,-i,, - + +) Cr- i) ‘he renewed and the remaining n — j 


vectors Tra-tiigs es hin be taken from preceding iterations. 
In this case system (3.6) should preferably be written in the form 
A,r; =e;, i=1,..., n. (3.18) 
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If the requirements of lemma 3.1 are retained, then repeating its 
proof word for word, it can be ascertained that for matrix A, defined 
by system (3.18) the condition || A, — fr|| ~O as k + oo is also 
satisfied. 

Using different methods for constructing vectors r; in system (3.18) 
one can obtain several well known minimization algorithms. For 
instance, if we set r; = v,; (and y; = x, + Up;) Where v,; is a vector 
directed along the i-th axis of coordinates and such that || v,;|| ~0 
as k — oo, then system (3.18) will take the following form: 


Apvpi = f (a + Uni) — fF (nx), t= 1,..., 2. 


Matrix A, defined by this system is a finite differences analogue of 
the matrix of second derivatives f” (x;); thus in this case process (3.4) 
transforms into a finite differences analogue of Newton’s method 
with adjustment of step length. On the basis of theorem 3.1 it can 
be asserted that Newton’s finite differences method with adjustment 
of step length converges from any initial approximation at a super- 
linear rate. Assuming that matrix f” (x) satisfies Lipschitz’ condi- 
tion (2.8), it can be shown, using the preceding results, that if 
[| Unal| <ll fel], Newton’s finite differences method converges at 
a quadratic rate. This can be seen from the following argument. 
In lemma 3.1 the bound on the quantity B,r; = B,v;; takes the form 


| Bavarl]l < sup || f" (cp + Wai) — Ff" (Zz)II Il Vaile 


0<t<i 


Hence, using (2.8), || B,v,;|| < R|| va;||?. From this inequality and 
the estimate || v,;|| <|| fell, as in lemma 3.1, we obtain 


| An — fa || =|| Be || R p> | 55 | Il uri I<F 2 5; ||| fall = Il fe |l- 


According to (2.4) the gradient f’ (z) satisfies Lipschitz’ condition 
with constant M. Consequently, 


WPAASRW AKI HR OL |< RM || z2—2, ll. 


The bounds on the rate of convergence obtained in theorem 3.1 
can now be defined more exactly as follows: 
ll Tr+i — Z|] < Mel] An — frell || 72 — Zell 
< M, (|| An — fall tll fa — f" (en + 9 (ee — 2,)) III] Cr —Ze ll 
<= M, (RM + R)|| Lh — z,\\?; 
1.€. 
Il Tra — Zl] < Cll tc, — 2, |I?. 


19 


UNCONSTRAINED FUNCTION MINIMIZATION 


Let us describe another method of choosing vectors r;. We set 
r; = —A,f’ (yi), ¥i = Vieng =i tri, i=1, 2, ..., n. Then 
the system (3.18) takes “he ‘following form: 


—A,A pf (y:) —= f’ (Yi+1) ~ f’ (Yi): l= 1, oe oy Nh. 


Matrix A, defined by such a system of equations (accurate to 
a numerical factor A,) is used to construct an iterative process (Aitken 
and Steffensen’s method). We shall not study the properties of this 
method here. Note only that by varying the value of factor A, 
a quadratic rate of convergence of the process can be obtained. 


Determining Vector 7, 


The amount of work required to compute vector p, determines 
to a considerable extent the computational effort in process (3.4). 

We shall now consider a method of constructing vector p, = 
= —Aj,)/, which makes use of the specific properties of system (3.6) 
that defines matrix A,. By taking account of these properties one 
can considerably simplify the construction of vector p;; we begin 
with the inversion of matrix Az. 

The necessary condition for the existence of matrix A;' is that 
matrix A, be nonsingular and this in turn necessitates linear inde- 
pendence of the vector system eé,, ..., €x-n+,- With sufficiently 
large k, matrix A; is nonsingular as shown by (3.16). However, at 
some iterations of the initial stage of process (3.4) the vector system 
Ch, - +--+) Cr-n+, Can prove linearly dependent. In this case we can 
either change one of the vectors r,_; or make a step in the direction 
of the antigradient this causing a change of system e,, ...-, €x-n41- 
We assume in what follows that with any k >n— 1systeme,, ... 

-, €n-n+, iS linearly independent. 
In this case system (3.6) can be written in the form 


Aj, €p-1 = TR-iy i= QO, 1, oe eg LL — { 
or in the form of a matrix equation: 

A; E, = R, (3.19) 
where £,, &, are matrices whose columns are coordinates of vectors 
é}_; and r,_; respectively. From (3.19) we obtain 

Ax = R,E;’. (3.20) 
Thus in order to construct matrix A; it is necessary first of all 
to calculate matrix E;’. It is known from linear algebra (see, for 
instance, D. K. Faddeev and V. N. Faddeeva) that rows of matrix E;! 


will be vectors whose basis s,, . . ., S,-n+, is dual for or biorthogona] 
to basis e;, ..., @r-n+,- Recall that linearly independent systems 
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of vectors a, ..., @, and b,, ..., 0b, are called dual if they satisfy 
the conditions 


(aj, b;) = 0 with i = l, (a;, b;) = {, 


If sp, .. +, Sk-nt, iS the dual of basis e,, ..., @p-n4+,, then 
{according to the relations of duality) SZZA, = J, where S; is a matrix 
whose columns are vectors s,-;. It follows that S? = £;’. 

Each of the matrices #,, k=O, 1, ... differs from the next 
one on the left and on the right side only by one column. By virtue 
of this fact the process of constructing the basis sz, ..., Sp_n4, can 
be performed by recursive relations and this to a considerable extent 
reduces computational efforts. 

Suppose we have calculated matrix £;', i.e. have constructed the 
basis Sp, .. +, Sp-nt+i,- Let us construct the system of vectors s,4,, 


Sp, - ++, Sk-n¢a as follows: 


Sha _. Sk—-n+14 
. (Skh_n+4> Chai) ° 
Sr+i-j = Srti-j — (Skti-j> nti) Seti, J = 1, .- 6, 2 — 1. (3.21) 


It follows from the linear independence of vectors e,4,, en, ... 
e@ 8 ef Cr-n+2 that 


(Sp-ntir Cr-+1) ~ O (3.22) 


Indeed, by virtue of the duality of the bases s,, ..., s,_,4, and 
Ch, ++ +> Ch-nty We have (Sp-n43, Ck-j) =O with jf = 90, 1,... 
..., m— 2 and if we had (sp_n44, €n4+1) = 0, then it would follow 
that vectors é€;41, .- +, Ck-n+e are linearly dependent. Thus, in 
order to check the linear independence of vectors €n41, €hy - - +> Ck-n+as 
it is sufficient to check condition (3.22). 

Let us show that the vector system (3.21) is a basis which is dual 
for the basis ep44, -- +> Ck—-n+2- 

Indeed, 

~ (Shaiins Chai) __ 

(Shta1 ens) 7 (Sket-ns ¢h+1) _ 1, 


—_ (Skag—ns €h+4) 
(Sprij, eR Spaie_jy Chat) —(Skii-jy Chay) Wee 
k+1-j9 +1) ( 1-}) 1) ( ji 1) (Shat—ns €he4) ’ 
(Skai—ns @k+1-j) 


S e ——) —_.. Ee 
(Shtay On+a-J) (Sk+i-ns €k+4) 


by virtue of the duality of the bases e,, ..., @x-,4, and Sz, ..., 
Sk—n41> 


_ | (Sha4—n> @k41—-m) 
(Sr44—j; Ck+1-m) = (Sr4ti-j €h+1-m) — (Sp4i-j, €n+1) “Shai? €hai) = Sim 
—N? 
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j, m= 1, 2, ..., n —1, by, is Kronecker’s delta (6;; = 1, 6), = 
= 0 with j ~ m). Consequently, the conditions of duality are satis- 
fied for vector systems €,4,, -- -, €k-nte ANd Sp4), -- +) Sk-n+eo, and 
this corroborates our statement. 

Thus, using recursive relations (3.21), the construction of basis 
Shti, ++ +) Sponte (i.e. of matrix Zy41,) is performed quite easily. 

We can now derive a simple formula for the determination of the 
direction of motion p,. For this purpose let us write equation (3.20) 
in the following form: 


n—1 
Aji = >) rishi (3.23) 
i=6 


(where superscript « denotes a vector-row). Using (3.23) we obtain 
n—1 n—-i1 

Pr= — Ax’ fr = — a. r).-iSk—ifk = — 2. (Sp-is fr) Tr-i- (3.24) 
i= i= 


It was this formula for the determination of the direction of descent 
in methods of the type (3.4) that caused such i:lyorithms to be called 
methods of dual directions. Using (3.24), formula (3.4) can be 
written 


n—i 
VT s ; 
th+1 — Lh — Op 2 (Sr—iy Su) Pri (3.29) 
—— 
or in coordinate form 
n-1 7 ' 
yy — Vv __ 7 j of (Zp Vv __ 
ry, ,=2zY—a, >) > Shia ha v=1,...,n. 
i=l J=—1 


In practice, sequences of approximations should be always constructed 
by this formula. 

Note also that using expression (3.23) a recursive formula can be 
obtained for the calculation of matrix Aj). We give it here without 
its derivation: 


Aiki = An’ + (Tati — Antenss) Stas. 
It is easy to check its validity by direct multiplication of a matrix 


constructed by this formula and vectors ep4,, ep, -- +; €k—-n+o- 
It can be seen then that 


-—] __ . 
Anis Chti-i — Trti-iy b= Q, 1, Oe ee 1, 


i.e. that matrix Az\, satisfies system (3.6). 
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The Initial Stage of the Process 


Until now we considered the iterative process (3.4) beginning 
with & = n — 1 since for the definition of matrix A,, nm vectors r; 
and corresponding vectors e, are required. 

The first iterations of the process (k < n — 1) can be performed 
in various ways. For instance, use can be made of the method of 
steepest descent: 2,4; = T, — Opfr, @, > 0, K = 0, 1,..., n—2. 
In order to secure uniformity of the algorithm from the first itera- 
tion on, we can proceed as follows with Ox k <n — 1. 

Set Aj’ = J. Present this matrix in the form A;>* = R,E;}, 
where Ry = I, EZ’ = I, or, using (3.23), 


n—-—1 
+ WN x” 
Ay = 2 To-iS0-i 
1_— 


where rp, T-3,; - ++; T-nti and So, S_1, ..., S-n4, are vectors of 
a unity orthonormalized basis. Taking this into account, we have: 


n-1 
Ly =X — A 2. (fo: So-i) To-i- 
— 


Further, having calculated vectors 7, and e, by (3.21) we construct 
the basis: 


Ss. 
S, _ n+i 


(S_nair €14) ; S4-7 = $1-j» — (81-;, €1) $1, j=1, eee, n—1 
—n+i 


and the next approximation: 
n—1 _ 
Le — ty — Ay »2 (fis S17) 1-3. 


, 


The construction of the successive iterations is straightforward. 


Minimization of Quadratic Form 


Let us consider as an example the application of the methods of 
dual directions to the finding of the minimum point of a quadratic 
function. Let 


f (2) = (Az, 2)+(b, 2) +e 


where A is an n X n symmetric, strictly positive definite matrix 
with constant elements: (Az, z) > 0 for any x= 0, b is a vector, 
cis a scalar quantity. The gradient of this function is f’ (7) = Az + 
+ 5 the vector 


ee =f (2 + ri) —f (2) = Ani. (3.26) 
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Therefore, if 71, ..., 7 is a linearly independent system of 
vectors, S,;, . . ., S, is the basis dual for the basis e,, .;., e,, then 
because of (3.20) and (3.23), we have 


n 
~1 - % 
Aj =R,E; = di rj3S;. 
Y 


Now, since it follows from (3.26) that matrix A is defined by the 
system of equations Ar; —e;, i= 1, 2, ..., m, we can write 


A'=R,E;' = » r 3st, (3.27) 


i==1 


e, Aj} —A™7'!. Hence 
tn41=In— Ag fn =In—A*(Az,+b)=—Ab (3.28) 


and fny1 = —AA b+ 60 =O, i.e. ra4, = 2, 

Thus, in order to minimize a quadratic function by the method 
of dual directions we have to calculate the gradient of the function 
at n + 1 points and construct a basis which is dual for the basis of 
vectors €,, . .., €,. If we consider the process of successive calcula- 
tions of vectors e€,, ..., en aS a certain iterative procedure, then 
it can be said that methods of dual directions make it possible to 
minimize a quadratic function after a finite number of steps. 

Note also that the problem under consideration is equivalent 
to solving a system of linear equations Az = —b. Consequently, 
methods of dual directions make it possible to solve a system of 
linear equations by performing a finite number of iterations. 


Discussion of Properties of the Methods 


Methods of dual directions make it possible to solve the problem 
of minimizing a strictly convex smooth function whatever the initial 
approximation chosen and the rate of convergence of the sequence 
{x,} to the solution is superlinear. The method of choosing para- 
meter a, guarantees the determination of the required value of a, 
after a finite number of reductions. Of course, in process (3.4) as 
in the methods described in the preceding sections a, can be chosen 
under the condition of obtaining the minimum function value in 
the direction of motion; however, such a method is more laborious. 

The methods of the class under consideration approach Newton’s 
method as to the estimate of their rate of convergence. Let us com- 
pare the labour per iteration in the methods of dual directions and 
in Newton’s method. 

In processes of type (3.4) with matrix A, defined by system (3.6) 
in order to calculate matrix A;' we have to calculate vector e, and 
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then using recursive formulas (38.21) to construct a basis dual for 
the basis e@,, .. +, €k-n+1> 

For constructing vector e, as stated in the subsection on p. 74 
it is necessary to calculate the gradient of the function at one or two 
points. The amount of work required to construct the dual basis by 
formulas (3.21) is only 1/m of that required by the usual methods 
(D. K. Faddeev and V. N. Faddeeva) (it is reduced even more if we 
check conditions that provide the inequality to zero of the denomi- 
nator in the general formulas for the construction of a dual basis). 

Thus first, the methods of dual directions, as distinct from 
Newton’s method, do not require calculation of the second deriva- 
tives of the function. We compare the dual direction methods to 
Newton's finite differences method. We find that the amount of 
computations in the former methods required for the construction 


. 1: 1 . . 
of matrix A;' is about — as much, since Newton’s method necessi- 


tates the calculation at each iteration of derivatives at nm + 1 points 
and the inversion of matrix A; without using recursive relations. 

In Newton’s finite differences method, in order to determine the 
direction of motion p,, instead of inverting the matrix one can 
solve a system of linear equations (like it can be done in the usual 
Newton method). In suchacasethe quantitative estimation of the 
computational effort in methods of the type (3.4) and in Newton's 
method depends on the method of solving the system of equations; 
however, the ratio is also approximately equal to n. For instance, 
if for solving the system of linear equations we use the method of 
dual directions (see the subsection on p. 79), we have, in practice, 
to calculate matrix A;! without using recursive relations, which, 
as was mentioned above, requires an m times greater amount of 
calculations as compared to that required by formulas (3.21). 

Computational effort is nearly the same in solving a system of 
linear equations by methods of conjugate directions; we shall discuss 
this method in the following section. 

Thus methods of dual directions converging at a rate close to that 
of Newton’s method require at the same time a far lesser amount 
of calculations per iteration. The shortcoming of these methods is 
that their implementation on computers requires a larger storage 
capacity of the computer, since it is necessary to memorize two 
systems of vectors rp, Try, -- -> Tr-nty Dd Sp, Spy, - + +3 Sk-ntiy 
i.e. actually two nm xX n matrices. This is an obstacle to the applica- 
tion of methods of dual directions to the solving of large size prob- 
lems on computers with a limited working storage. It should be 
noted, however, that this shortcoming is partially compensated if 
we use vectors directed along coordinate axes as vectors r,, for 
in this case we have to store in the computer memory only one n- 


dimensional vector instead of the system r,, ..., Tx_n+1- 
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4, METHODS OF CONJUGATE DIRECTIONS. 
MINIMIZATION OF QUADRATIC FUNCTIONS 


Conjugate Directions 
and Their Properties 


Let us turn again to the problem of minimizing quadratic functions 
of the form 


f(t) => (Az, 2)+(b, 2) (4.1) 


where (Az, xz) >0O with any z= 0; we considered this problem 
in the preceding section (the subsection on p. 79). It is easy to ascer- 
tain that the problem of quadratic function minimization can be 
reduced to the inversion of matrix A; if matrix A@ is known, then 
the solution is immediately found by using formulas (3.28): 


L, = 2% — AY, = —A'D (4.2) 


where z, is an arbitrary point. 

If one calculates matrix A= using expression (3.27), it is neces- 
sary, having chosen an arbitrary linearly independent system of 
vectOrs Po, ---; Dn-1 (we use here the notation p; instead of r;), 
to calculate the corresponding vectors 


e; = f'(a; + p;) —f’ (t;3) =Api, i=O0,1,...,n—1 (4.3) 


where 2; are arbitrary points, and to construct a basis S, ..., Sn. 
dual for the basis é), ..., €n-1, i.e. which satisfies the conditions 


(s;, €;) = 1, (s;, ej) = 0 with i -j. (4.4) 
These relations, because of (4.3), can be written in the form 

(s;, Api) = 1, (8, Apj)=0, iF]. (4.9) 
Of particular interest is the case in which vectors po, --.-, Pn-a1 


are A-orthogonal or, as they are sometimes called, conjugate, i.e. such 
that 
(Di; Apj) — Q, i = j. (4.6) 


The system of (nonzero) vectors po, .- +, Pn-1 Which satisft ies condi- 
tions (4.6) is linearly independent (being orthogonal in a metric, 
defined by a nonsingular matrix) and accordingly can be used to 
determine vectors e; by formulas (4.3); vectors s; which satisfy 
(4.5) in this case can be calculated by very simple formulas: 
Pi ; 
Sj Api, pi)’ i =O, 1, sey n— 1. (4.7) 
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Thus if vectors Po, ..-., P,-; are A-orthogonal, matrix A is 
calculated by the following formula (see (3.27)): 


n—i n—1 


A*.- 3 pist= >i _ PPE (4.8) 
oe 2 (Api, pi) ” 


i.e. the problem of the inversion of matrix A, and thus that of the 
minimization of function f(z) is solved quite easily. 

Let us now consider the problem of determining point z, with 
the aid of conjugate vectors from a somewhat different viewpoint; 
at the same time we shall study a number of interesting properties 
of conjugate directions. 

Since Po, ---, Pn-1 is the basis of space EZ", point x, may be pre- 
sented in the following form: 


n—i 
t= 2+ 2 Qi Dj. (4.9) 
But by (4.2) and (4.8), 
n—1 
Pipe, 
7 =— 2) ap, py fo" 4-0) 
i=0 


It follows from (4.9) and (4.40) that 


PiP; 
ry + >) ap; =x)— S ——— f- 
ot ; iPi 0 f | (Ap;. Di) th 


or in another form 


—_ (fos pi) 
tot Dy Mi Pi = Xo — D TAni my Pi (4.11) 
4 2? 


Since a vector has only one resolution along the basis axes, the 
last equality determines the values of coefficients a; in the expan- 
sion (4.9) 


— ___Vfov pi) (fs id) - 
Qi Api pil py i=0,1,...,m—1. (4.12) 
Thus if a certain system of conjugate vectors is known, then the 
minimum point of quadratic function (4.1) is easily found by for- 
mulas (4.9), (4.12). 


The procedure of determining point z, by formula (4.9) can be 
considered as a process of construction of successive points: 


Ling = Tj +ap, i=O0, 1,...,n—1 (4.13) 
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where parameters a; are determined by formulas (4.12). It follows 
that using the method of conjugate directions one can solve the problem 
of quadratic function minimization after performing a finite number 
of steps not exceeding n (the number of points in the iterative 
process (4.13) can prove less than n if some of the coefficients a; in 
expansion (4.9) prove equal to zero, i.e. if for some i we have 
(f,, Pi) = 0). The above property of the method of conjugate di- 
rections is the most important one. It shows how effective is the 
application of conjugate vectors to quadratic function minimiza- 
tion; this is the reason of wide application of methods of conjugate 
directions. 

As an interesting corollary to the result obtained, it can be shown 
that point x; constructed by formulas (4.13), (4.12) is the minimum 
point of function (4.1) on the subspace formed by vectors po, .. + Di-1 
and passing through point zy. Let 

i-4 
Li =X + 2 On Dr 
where a, are arbitrary coefficients. For point z; to be the minimum 
of a strictly convex differentiable function in the subspace formed 


by vectors py, - - -» Pi-i, it is necessary and sufficient (corollary 4.4 
of Chap. I) that the following conditions be satisfied: 


(f' (), ps) =0, 7 =0,1,...,1—4. (4.14) 


Now for any 0 <j <i — 1, we have 
i-4 
(f" (zi), Py) =(Azi+9, Dj) =(A (zo+ X Gr Pr) + by Dj) 


1-1 


= (Azo +b, Pi) + Xo (ADa, Pj) =(fo. Pa) +a; (ADpj, Pj)e 


Hence taking into account (4.14) we have that point Z; provides 
the minimum of the function in the subspace, formed by vectors 
Po. «+ +> Pi-y and passing through point zx, if and only if (fo, py) + 
+ a; (Apjy py) = 90, ie. 
I (Apj, Pj) ° 
Now these coefficients coincide with the coefficients a, calculated 
by formula (4.12), i.e. point x; which provides the required minimum 
coincides with point x; (4.13). Consequently 
(f’ (z:), pj) = 9, j=0, 1,...,i— 1. (4.15) 
It is now clear that minimization of a quadratic function in space 
E" by formulas (4.13), (4.42) can be interpreted as a process of 
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successive minimization of the function in subspaces of i + 1 di- 
mensions, i= 0, 1, ..., m—1, it being necessary to calculate 
only one coefficient «; in order to find every time the next minimum 
point. 

Note that in finding a; by formulas (4.12) we need not practically 
calculate the matrix of second derivatives A and it is necessary to 
calculate only vectors e; = f’ (x; + pi) — f’ (z;) (see (4.3)), i-e. 
only the first derivatives of the function. 

It is easy to ascertain that formulas (4.12) can be transformed as 
follows. If x; is determined by formula (4.13), then 


(i. p= (fh—-fth—---—fit fis pi) 
= (—Q)Apy — GAP, — .. . — &-yADPj-y + fis Di) 


and according to the A-orthogonality of vectors po, ..., py we 
have (f,, pi) = (fi, pi). 


Consequently 
(f;, pi) (f5, Pi) 
= — 9 ’ ee | — 1. 4.16 
a (Ap;, pi) (e;, Di) b= 0 f " \ 


It follows that if with a certain 0 < i < n — 1 in formula (4.13) 
a, = 0 (i.e. 234, = 2z;), then this means that (f;, p;) = 0. Combining 
this equality with (4.15) we obtain 


(fits Pj) = (fi, P;) =(Q), j=0, 1,..., i. 


Thus the fact that the coefficient a; becomes zero means that the 
corresponding point x; provides the minimum of the quadratic function 
in the subspace formed by vectors po, ..., pi; and passing through 
point Zo. 

Finally, note that by (4.15) (fi, p;-,;) = 0. This means that the 
choice of coefficients a; by formulas (4.12) or (4.16) corresponds to 
choosing a; under the condition 


f (a; + aip;) = min f (x; + ap). 


Construction of the Methods 


In considering in the preceding subsection the effectiveness of 
methods of conjugate directions for the minimization of a quadratic 
function, we did not even mention the methods of constructing 
such vectors and the work involved in this procedure. 

We turn to the study of methods of constructing A-orthogonal 
vectors. Each of these methods determines one or other method of 
conjugate directions, which consists in the construction of successive 
approximations to the solution of the problem of minimization of 
function (4.1) making use of formulas (4.13), (4.12) (or (4.16)). 
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The effectiveness of the methods of conjugate directions depends 
directly on the amount of calculations to be performed in order to 
construct the system of conjugate vectors. If the method chosen for 
constructing the conjugate vectors proves too laborious, then the 
effectiveness of the corresponding method of conjugate directions 
may prove to be low (as compared to algorithms of other classes). 
Therefore, it is worth while to specify the general requirements 
which must be satisfied by any method of constructing conjugate 
vectors in order that the corresponding method of conjugate direc- 
tions be effective. 

First, the process of constructing conjugate vectors should use 
only calculations of the function and its gradient and should not 
require the calculation of second derivatives of the function. If this 
requirement is not satisfied, then minimization of a quadratic 
function by method (4.13) can involve the need of calculating the 
matrix of second derivatives, and moreover require the calculation 
of gradients at several points. Therefore in general, a method of 
conjugate directions which requires the calculation of the matrix 
of second derivatives proves less effective than Newton’s method 
(with the possible exception only of problems in which the inversion 
of matrix A is far more laborious in comparison with its calculation). 

Secondly, the information about the function should be used only 
at points of the sequence (4.13). In other words, the process of con- 
structing conjugate vectors should be such that in determining 
vector p;,0<i<n—l, the function and its gradient be evaluated 
only at points 2%, ..., 

It follows from this ‘equirement that one should consider only 
such methods of constructing conjugate vectors that 


(fi, Pi) =9, OSicn—t (4.17) 


is satisfied if and only if f; = O. Indeed, if condition (4.17) is satis- 
fied, then by (4.16) a; = 0 and therefore in sequence (4.13) x2;4, = 7;. 
This means that at the(i -+ 1)-th iteration of the process we shall 
not receive any additional information about the function and there- 
fore shall not be able to construct vector p;;, - p;. The process 
will accordingly degenerate (stop) without reaching the solution 
if fi -~ 

Thus for any of the methods of constructing conjugate vectors 
(and for the corresponding method of conjugate directions), the 
condition 


(fi, Pi) AO if fi KO (4.18) 
must be satisfied. This condition guarantees that at any of the itera- 


tions of the process we shall have a; € O. 
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In working out algorithms for the construction of conjugate vectors 
we should assume that condition (4.18) is satisfied. Then, the algo- 
rithms having been constructed, it is necessary to check whether 
this condition is actually satisfied and, if necessary, to impose 
additional constraints on the algorithm in order to satisfy the condi- 
tion. 

Taking into account the above remarks, let us turn to the actual 
working out of the relations for the construction of A-orthogonal 
vectors. 

In what follows we use the notations 


ri = Lay — Zi = O:Pi, CG: = fia — fi = O;ADi. (4.19) 


An arbitrary direction of descent of function (4.1) may be chosen 
to be vector pp = —Ajf,, where H, is a symmetric, strictly positive 
definite matrix. 

Let us establish the requirements which vector p,,1 << k <n —1, 
must satisfy in order to fulfill the conditions of A-orthogonality: 


(pp, Apy) = 0, OxSjxk —t1. (4.20) 


To this end, we make use of the fact that according to the prop- 
erties of conjugate directions (see (4.15)) in choosing a; in process 
(4.13) by formula (4.16), conditions (4.20) and at the same time also 
the equality 


(fe, Pj) = 9, OXJxk—t (4.21) 
must be satished. If we set 
Pr = — ihe (4.22) 
where H, is an nm X nm square matrix, then conditions (4.20) can 
be written in the following form: 
(fr, H,Ap;)) =0, O<jxk— 1. 
Comparison of the equalities  ptained with (4.21) shows that if 


(4.21) is satisfied, then (4.20) will also be satisfied, provided matrix 
H,, satisfies the relations es. 


H,Ap; =ap;, DXjagk—t 


where a is an arbitrary constant.| 
Since according to condition (4.18) and the strict convexity of 
function (4.1) we have 0 <|a; | <oo with any OSi<cn—1, 
equalities (4.20) and (4.21) can be written in the following form: 
(Tp, €;) = Q, 0 <= j <= k — 1, (4.23) 
(fi, 77) =0, OSj<k—4, (4.24) 
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and the conditions for determining matrix H; can be written as 
follows: 
Hpe; = ar;, Ox jk —t1. (4.25) 


Thus the conditions of A-orthogonality (4.20) will be satisfied if 
matrix H, which determines vector p, by formula (4.22) satisfies 
equations (4.25). 

With k <n — 1, the number of vector equations (4.25) will be 
less than n; it follows that matrix H, is not uniquely defined. Besides, 
with different values of constant a the systems of equations for 
defining matrix H, will also be different. All this suggests the diver- 
sity of algorithms which can be used to construct conjugate direc- 
tions as we have to use various methods of constructing different 
matrices f7,. 

Since equations (4.25) must be satisfied with any k = 1, 2,... 

., n— 1, it is natural to try and construct matrix H, by recur- 
sive relations. 

Let us write (4.25) in the following form: 


(H,-. + AH,4) ej =ar;,, OS Jaqk—t. (4.26) 
Since matrix H,-_, must satisfy the equations 
H,-4e€; = ar;, Ox j<k—2, 
it follows from (4.26) that matrix AH;-_, is defined by the following 
conditions: 
Aff; _,e; = QO, Ox<jx<k—zZ, 
Ay, -y€n-1 = OTp-1 — Hp-1ep-1- (4.27) 
The latter equality will evidently be satisfied if we assume 
Th-iux_, HT p-1€h-1V3 _ 4 
a  ——— ——— 
(UpR—1) €h—1) (VR-1, €h=4) 


AA y-y — (4.28) 


where U;,-,, V,-, are unknown vectors. It is necessary that the vectors 
be such that the first of the conditions (4.27) is satisfied, i.e. 


(Up-y, €j))= 0, (Una, e) =9, OXSFRK—2 (4.29) 
Clearly, vectors u,-,, Vz-, must also satisfy conditions 
(Un-1» Ch-1) AO, (Vpn-ay Cr-1) F O. (4.30) 


Taking into account (4.23) it is clear that conditions (4.29) will be 
satisfied if we choose u,z_, = Vp-y = Tp-1- Conditions (4.30) will 
also be satisfied since 


(Thay Cr-a) = (Tr-a. ATaa) > 0 (4.31) 


according to the properties of matrix A. 
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Vectors Upz-, Vp-, can also be chosen by using the following con- 
siderations. If condition (4.20) is satisfied, then we have 


1 4 ; 
(Appa, Pj) — (—— ena, Oy r;) =(, Oc jxck—2z. 


On-4 
Making use of (4.25) we have then 
@ (Cpa 3) = (Cra, Hn-1ej) = (Ain-a, @r-ar €j) = 0, 
Ox<jxk—2. 
It follows that in order to satisfy (4.29) we can assume 
Una = Va-y = Hi_yep-1- 
In general, if we choose vectors uw,_, and v,_, in the form 


UR = ty RT h-1 + to nHr-yer-1s 
Vp—-y = Ug. plr-a 1 ta nh-ver-1 (4.32) 


where t,.2, ton, fs.n, fy.» are arbitrary numbers (which in principle 
can change with changing k), then conditions (4.29) will obviously 
be satisfied. In order to satisfy conditions (4.30) the quantities ¢; ,, 
i=i1, ..., 4 should be adjusted, if necessary (in particular, as 
has been noted, conditions (4.30) will be satisfied with ¢,, = t3, = 
= 1, to, = ty.» = 0; see (4.31)). 

Thus, choosing vectors u,z-,, Vp-, in the form (4.32) we are able 
to construct matrix AH;_, by formula (4.28) and in this way establish 
the recursive relations for constructing such a matrix AH, that the 
vector p; which it determines will satisfy the conditions of A-orthog- 
onality (4.20). To each pair of vectors u,_,, Vz-, and constant a 
chosen there will correspond their particular matrix AH,-_, and, 
consequently, matrix H,. In other words, with different vectors u;, 
v, and constant a we shall obtain different algorithms for construct- 
ing conjugate vectors, i.e. shall construct different methods of con- 
jugate directions. 


General Properties of the Methods 


Let us try to establish the general properties of methods of conju- 
gate directions, which can be constructed in the manner described 
above. 

First of all, it is necessary to ascertain whether condition (4.18) 
is satisfied by the methods under consideration since in working out 
the methods of constructing the algorithms, it is assumed that this 
condition is fulfilled. 

Another interesting question is whether the directions p;, j = 
= (0, 1, ..., n — 1 which are determined by different matrices H, 
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differ from one another, i.e. whether points z,, ..., Z,-, are differ- 
ent for different algorithms (on condition that point z) remains the 
same) or they coincide. 

In order to answer these questions let us present vector —p; = 
= Hjf; using the recursive formula for matrix H; and expressions 
(4.28), (4.32) in the following form: 


H3f; = (Hyj-a + AHj4)* fi. 
Making use of (4.24) we can write 


AH? f' _ Vi-1 (€j-4; HF fj) 
g— hla (Vj-4, ej-4) 


(tg, gr gah ty, pAF ye 7-1) (@j-15 HF 4f) 
(Vj-4) € j-1) 


If we also take into account, that 
H7-ej4 = H5afj — Aj afin = Ay afi + Pin 
then vector —p,; can be written in the following form: 


kee ’ tj (¢j-1, HF_ fi’) 
H3fj =H} sfj[ 4-7 = ] 


(Vj-4, €j-1) 
ts, j7j—-1€F_ 4 1 th, jPj-12F_4 0 
H*_,f; 
(Vj_4, €j~4) ja; 
t,y (ey-1, HFF;) ry—1eF_y ta, j 
—|7J(q4——i eee Se) |He A 
| ( (Vj-1, €j-4) (Vj-1, €j-4) (t3.9+ Oh j-4 if; 
Further 
(Vj, €j-4) = (ts, jr jy + ts; f 16 j-45 €;-1)] 
= tg, (Tyas Cj) + t4,5 (Cys. AZ 1f5) + ty3 (Cia, Pj-a)} (4.33) 
hence 
(rj-4, €j-1) ( 4 th, j )=41— ty, j (ej-1, HF 4 f5) 
(Uj-4 €j~-4) os X j-4 (Vj—-4, @j-4) ig. 
Using this expression to transform the formula for H*f;, we obtain 
rj-1€5_ 4 
AF fj =; (I -7- era). j-1f3 (4.34) 
where 


tta,j (¢j-1. HF 4f}) 
(Opty € 7-4) 
If vector vj, satishes condition (4.30) and a;-,t3,;4— ts, ;, with 
any j=-1, 2,... factor y;0 since 
ty, y (3-4, HF_4f}) 


yy ail 


~~ 1. 
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This inequality is easily checked by comparing the numerator 
of the ratio with expression (4.33). Further, suppose that factors ty ; 
and t,.; are such that with j => 1 the conditions (v;_,, e;_1) 4 0 and 
yj % O be satisfied. 

Let us perform the scalar multiplication of the two sides of equali- 
ty (4.34) by fr 

(Fas M51) OF 4 


(fas H3F3)= 95 | (fk)” — 


Since yj 0 and (f;, Hf) — 0, (fis rj) — 0 with j <k—1 (by 
(4.21) and (4.24)), it follows from (4.35) that 
(fp, H5iff)) =90, 1<jqk— 1. (4.36) 


Subtracting the equalities (f,, H7_4fj,) =0, 1<j<k—1 from 
(4.36) we obtain: 


(f;, Hte; —0, OSixk—2. (4.37) 


|Ahiaf. (4.38) 


Let us now prove using the relations obtained that withO mi< 
<j — 2 equalities H#,,/; = AFf; hold, i.e. 
W3ifj = H5efj =... = Hofi- (4.38) 
Using the recursive formula for matrix H,; and taking into account 
conditions (4.24), we can write vector H#,,f; in the following form: 
vj (Hiei, £5) 
(vi, ei) 
In order to prove equalities (4.38) it is necessary to show that 
(fi, Hie:) =0, OSiacj—2. (4.40) 
Using again the recursive formula for H; we obtain: 


(f;, Hi s€5) (Vg, €i44) 


Ai f5 = Hifj— », OSixj—2. (4.39) 


(fj, Hisseiss) = (fj, Hoeizs)— >) 


0 (Vs, €s) 
0<i<j-2, (4.41) 
, * ’ (Vs, £5) (Hes, +4) 
(fj, HF 4-4€:44) = (fj, A 9€i41) — » —tayo 
s=0 
0<i<j—2. (4.42) 


Because of conditions (4.24) and (4.37), we have 
(Vi, fj) = tsi (ir fi) + tai (Hie;,, ff) = 9, Oxiaj— 2. 
(4.43) 
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Taking into account equalities (4.43) and conditions (4.37), it fol- 
lows from (4.42) that 


(fj, Mo@i4n) = 9, OSixj—d. (4.44) 


Let us now consider the relations (4.41). With i = 0 we have by 
(4.37) (Hoe, fi) = (Hoe, fi) = 9 and by (4.44) (fj, Hoe) = 0. 
Consequently, (H,e,, f;) = 0. Further, we similarly establish that 
equalities (4.38) hold. 

Taking into account these equalities, we can write expression 
(4.34) in the following form: 

rj—1e7_ 4 


H¥fj=y, (1 ea 


This is the formula that enables us to answer the questions formu- 
lated at the beginning of this subsection. 

By scalar multiplication of the two sides of (4.45) by fj and using 
condition (4.24), we obtain 


— (fj, Pi) = Vi Fi, Hofi), 7B 9. (4.46) 


If H, is a strictly positive definite matrix, then (f;, H fj) > 0. 
Consequently, if y; + 0, then it follows from (4.46) that (fj, pj) 4 0. 

Thus, it follows from (4.46) that the assumption that condition (4.18) 
can be satisfied, used in working out the methods of constructing 
conjugate vectors, proves to hold if H, is a symmetric, strictly positive 
definite matrix. 

In order to ascertain whether vectors p; and points 2x;4,, i= 0, 
1, ..., ~—1 are different in different algorithms we turn again 
to formula (4.45). 

The first step in any method of conjugate directions is the same 
(given the same matrix H,) since z; = zr) — &,H“f, and a, is chosen 
under the condition min /f (x, — aH7f,). Consequently, point 2, 


Hof. (4.45) 


167 
and therefore vectors rp, €9, f, Will also be the same in any algorithm 
which can be constructed by the method described. But then as it 
follows from (4.45), the direction 


P= — Hifi =—y Le ryee | Hof 
1 1ji 1 (ro, eq) O14 
also will not depend on the choice of vectors uy, v, (which satisfy 
the requirements formulated), i.e. will not depend on the method 
of constructing matrix H,. To be more precise, vectors p, obtained 
with different methods of constructing matrix H, will differ only by 
the scalar factor y,. However since the quantity a, is chosen under 
the condition min f (z, + ap,), point z, which provides the mini- 
10.4 


mum will be the same whatever the method of constructing matrix 
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H, (by virtue of the strict convexity of f (z)). Consequently, the 
quantities r,, e,, f; will be the same for different methods of conjugate 
directions. Continuing this argument based on expressing vector p, 
by formula (4.45) we conclude that points Zp, 2, ..., Z, are inde- 
pendent of the choice of vectors wz, Vz, i.e. of the method of con- 
structing matrix H,. Thus the successive approximations to the 
solution of the problem of minimization of a quadratic function 
are the same for different methods of conjugate directions. 

One more remark. 

It could be noted above that the first of the two matrices that 
form matrix AH; (4.28), j =0, 1, ..., & —1 takes no part in 
constructing vector p,. Indeed in determining vector AA%f, we find 
according to conditions (4.24) that 


Ui Th) _ ig 0<j<k—i, 


(uj, €;) 
i.e. matrices 
i O<j<k—t 
a (uj, é;) ) SIS ~ 
take no part in constructing vector —p, = Hif, = 


k—1 
_ ( H, +S) AH; \* f’,. However, they affect considerably the 


j=0 
properties of matrix H,, in particular the properties of matrix H,. 
We shall take notice of this fact in studying concrete algorithms 
in the following subsection. Here we note only that the difference 
in the properties of matrix H, tells on the properties of the methods 
of conjugate directions in the minimization of nonquadratic func- 
tions. 


Concrete Algorithms 


, Let us now consider several formulas which can be used in con- 
structing conjugate directions. Let us repeat that each of such formu- 
las determines a method of conjugate directions consisting in con- 
structing successive approximations to the solution by formulas 


Trt1 = Ip + AnPr, Pr = —Hifh, =O, 1,..., n—1 (4.47) 
where a, is chosen under the condition min ff (x, + ap,), i.e. is 


determined by expressions (4.12) or (4.16). 
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(1) We set in (4.28) a = 1, Up_y = Try, Vp-y = Hi -yep-, (i.e. 
in formulas (4. 32) brHk= 1, lop =t, = Q, lok = ft, = QO, 
Lah =, = 1). Then 


Thi" he 4 A y_yep-sep_ 4 Hrs 
(Tr-1. €r-4) (HF er—4 Cr) 


A, = Ap 44+ (4.48) 
Let us study some properties of matrix H, obtained by this method. 

Matrix H, is symmetric. This fact is easily csiablished by induc- 
tion. Matrix Hy is symmetric. The two matrices which form AH, 
are symmetric too (the second one by virtue of the symmetry of H,). 
Therefore, Mh is a symmetric matrix. Similar arguments hold for 
any k = 2. n. 

Matriz H , is : strictly positive definite. We give a proof by induction. 
Matrix H, is strictly positive definite. Let H; bea strictly positive 
definite matrix. Then for any zx € E” 


(7p, x)? (pep. r)* 

H,, zy Xv) = A x, L —_—_-o or .:._— Oe nee— 

( k+1 ) ( R ) ++ (rR: ey, )2 (Up, en. Cn) 
— (Hint. 2) (Aner, en) (Tren. 2)" (rhe 2)" 
(Hen, Ch) (rh, ep) 


By hypothesis, there is a square root H." (D. K. Faddeev and 
V. N. Faddeeva). Consequently taking into account the symmetry 
of matrix H,, we have 
(H,x, c)= (H,/° Ha, 2) = (yx, Hy’) =(y, y); 
similarly 
(Hen, €n)= (Hien, Hy!"en) =(z, 2), 
({7 en, uw’) = (Hi Hye, ’ ta x) — (z, ij). 


Making use of these relations and applying Cauchy-Buniakowski’s 
inequality we conclude that the following inequality holds: 


(1,2, x) (Hyen, ex) — (Hen, +)? = (y, y) (2, 2) — (2, y)®? SO 


and this inequality holds only if z = y, i.e. since H, is nonsingular, 
only if z = e,. But in this case (r;, x) = (rp, ex) = (Te, AT) > O. 
Thus for any z~0, we have 


yy 2) MD A 
and this proves that our reasoning by induction holds. 
Mairix H, =A. Indeed, H, satisfies (4.25) with a= 1, i.e. 
Hye; = 7}, j = 0, 1, ..., nm —1, or making use of (4.19) 
H, Ar; = 1, j = 0, 1, o 8 ty n — 1, 
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It follows that vectorsr,, ..., rp-, are eigenvectors of matrix H,A 
with eigenvalues equal to unity. Hence, taking into account the 
linear independence of conjugate vectors r;, i=Q, 1,...,n—1 
we have H,A = J, i.e. 


HH, = An. 
But from (4.8) 


<3 Grew 7D = 2G oa , 


i.e. we find that matrix A, is determined only by the matrices 


6 pk 
r jus rir3 


(ri, €i) €;) (ri, ep) 


(this was mentioned at the end of the preceding subsection). 
(2) Another method of constructing H, is obtained if we take 
a = 1 in (4.28) and choose u,_, = Up_-, = Tpr-1- We have then 


Hy, = H,_,+ (Pha — Hasta) — ° (4.49) 

Matrix H, thus determined is no more symmetric. Since a = 1, 

we have now H, =A 7 and this can be demonstrated just in the 

same way as for method (4.48). 

Using (4.49) we can obtain a somewhat different formula for 
constructing H;. We write (4.49) in the following form: 
h-4 

HH, =Hy+ », >» (r; —H je; ) 


Ss 


oe 75 (4-00) 


According to the conditions of conjugateness (4.20) (taking into 
account formulas (4.19)), we have (e,, r;) =0, O<j<xk — 1. 
Consequently, it follows from (4.50) that 


Hye, = HT ep, k= Q, 1, oe eg L— 1. (4.51) 
Thus formula (4.49) can be written as follows: 
re 
Hy, = Ay + (Tra — A oex-1) Toe) . (4.52) 


If H, =/, this formula proves somewhat simpler than (4.49). 
(3) Let us choose a =O, vz_, = rz-,. In this case 


Hp _yerirp_, 


n= Bha— (rh-4, €h-4). (4.99) 
With a = 0 it follows from (4.25) that H,e; = 0, j = 0, 1, 
.., nm — 1. Since vectors é), ..., @,-, are linearly independent, 
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these equalities imply that H, = 0 (the linear independence of 
vectors e; = Ar;, i=0O,1,..., nm —1 follows from the _ linear 
independence of conjugate vectors r; and the properties of matrix A). 

Since condition (4.51) is satisfied for formula (4.53) too, the latter 
can be written in the following form: 


Hf oeh—irp 1 


Hy = Tray 


(4.54) 

The constructing of methods of conjugate directions can be con- 
tinued by choosing various combinations of constant a and vectors 
up, Vv, by formulas (4.32) but we shall not do so (there and further, 
speaking of a concrete method, for example (4.48), we have in mind 
method (4.47) in which formula (4.48) is used for constructing 
matrix H,). 

Let us make a remark here. Strictly speaking, it was necessary 
in each of the methods treated above to check whether conditions 
(4.30) were satisfied by vectors u,, v;. It is easy to ascertain that 
in all the methods studied, these conditions were satisfied. For 
instance, in the case u, = v; = r; the satisfying of conditions (4.30) 
was already mentioned in the subsection on p. 85. In method (4.48) 
Uv, = Hie,; however, since matrix H, is positive definite, we have 
(Uz, €x) = (Ager, x) > 0, i.e. conditions (4.30) are also satisfied. 
Thus, in accordance with the results of the subsection on p. 89, 
condition (4.18) is satisfied by the methods discussed, i.e. the methods 
are guaranteed to be nondegenerate. 

Let us now derive formulas directly applicable to the calculation 
of vectors p; defined by different matrices H,. This is easily done by 
using formula (4.45). Since r,_, = @p-1Ppx-1, we have from (4.45) 


Pr = — Yr (Hofk — BrPr-s) (4.90) 
where 
(Hof;, €r—s) 
(Pk-1, ¢r-1) ° 


Br = (4.56) 


If v; = Ht#e,, then 
(€n—4) HY} _ fh) 
(fH p_ser—4; €ph-1)° 


vn = 1— 


According to equalities (4.38) we have (e,-,, Ha_ifk) = (€n-1, Hofh). 
hai because of (4.21) and (4.38), (Hpe,, en) = (A afisi, froa)t 

+ (Ay, fr) = (Aofkri, frtoi)—(pr» fr). The equalities obtained 
show that 


(Hof, &r-s) 


Va= 1 (Hof;, f)—(Pr—a> fp_4) ° (4.57) 
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Note that from (4.45), because of (4.21) and (4.24), it follows that 
(provided y; = 0) 


(fx, Hof}) =0, OSF<Sk—-1. (4.58) 
Taking this into account, it proves that 
(Hofk, Cnr) = (Hofe, fr). (4.09) 
Using equalities (4.59) and (4.97) we find that 
__ (Pris fr) 
va (Hof,, fp) —(Pr—i» fh_a) * (4.60) 


Note also that 
(fk, Pr) = (fe, Pr) — (feti, Da) = — (Crs Pr)- (4.61) 


Comparing formulas (4.56) and (4.60) and taking into account (4.59) 


and (4.61), it is easy to establish that y, = is . Hence y,B, = 


= 1 — y,. Consequently, formula (4.55) which determines vector p,, 
in the case where in constructing matrix H, we use vector v;_, = 
= Hf_,e,_,, can be written in the form 


Pr = —YVrltofn + (1 — Yr) Pr-s (4.62) 


where coefficient y, is determined by one of the formulas (4.57) 
or (4.60). Vector p; can be written in the alternative form: 


Pr = —Hofx + Br (Hofr + Pr-r) (4.63) 


R RO  “(Hotks £,)— (Pras fea)” 


Other expressions can be obtained for coefficient 6, if equalities 
(4.59), (4.61), (4.46) are made use of; note that the last of these 
formulas can be written in the form 


(fx, Pr) = (Br — 1) (Hof, fi) (4.65) 


Calculating in expressions (4.62), (4.63) coefficients y,, B; by differ- 
ent formulas, we practically obtain different methods of conjugate 
directions. It should be stressed that in the minimization of non- 
quadratic functions different formulas for p, determine different 
vectors (both as to magnitude and as to direction). Especially simple 
is the construction of vector p, using expression (4.45) if vector 
Up-1 = Tp-; is Chosen in constructing matrix H,. In this case t, , = 
= 0, therefore y, = 1, and from (4.55) we obtain 


Pr = —Hofr + BrPr-a (4.66) 


where 


(4.64) 
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where 6, is calculated by formula (4.56). If we use equalities (4.59), 
(4.61) and (4.46) (the last one in the case under consideration takes 
the form 


(Pr, fr) = — (Hofh, fr) (4.67)) 


then for determining coefficient B, one of the following formulas can 
be obtained: 


(Hofp, €r-1) (Hof,s fr) (Hof,, fi) , 
_ Choa) NU Os Th Tg 
Pr (PR ty Spe > (Pr-1 fp_4) (Hof, _ 4) fh—4) (4 8) 


Expressions (4.62), (4.63), (4.66) which determine vector p,; in 
their turn can be given the form p, = —Aif,, where matrix Hf, 


depends on the coefficients y,, 6,, B,. For instance, if coefficient Bp 
in (4.63) is calculated by formula (4.64), then the corresponding 
matrix 

Hof, (Hof, + Pr—s)* 
(Hof,s f,)—(Pr-t1s f,-4) 
where H,= Hy (since f, =0). If in (4.66) B, is calculated by the 
first of formulas (4.68), then vector p, is determined by matrix 


Hy, = Hy — (4.69) 


Z Hoen—1Ph- 4 
Hn Hot Ga aa) ee” 
and if use is made of the second of formulas (4.68), then 
Hof; p* 
H,=Hyot+ Pht (4.71) 


(Pr—1, fh i) 


Note that in formula (4.71) H, = H, (since f, = 0); in (4.70) 
H,, ~ Hi. 
The reader himself can obtain other formulas for constructing H,. 
The simplest formula for calculating A-orthogonal vectors can be 
obtained by choosing H, = / in (4.66). In this case 


Pr = —fa + BrPr-r (4.72) 


where f,; is determined, for instance, by one of the following formu- 
las: 


_ (fp Ck- 1) (fp 7) (fp, 1, 
Pr= — (Pr- 1s tp ) OO (PRr— 1, fp 1) 7 ~~ 19 Spe i (4.73) 


Method (4.47) in which conjugate vectors are constructed by 


(4.72), (4.73) is widely known as the method of conjugate gradients 
(this name is due to conditions (4.58)). 
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Minimization of a Convex Quadratic Function 


Until now we considered methods of A-orthogonal directions for 
the minimizalion of a strictly convex quadratic function, i.e. as- 
sumed matrix A to be strictly positive definite. 

Let now the function 


f (2) = (Az, x) +(b, 2) +e 


be convex, i.e. matrix A be positive definite: (Az, zx) >0O with 
any x = 0. Suppose that this function has a minimum. 

Let us study the problem of the application of methods of conju- 
gate directions in this case. Consider preliminarily certain properties 
of function f (z). 


(1) If (Ap, p) = 0 then of necessity 
Ap = 0. (4.74) 


In fact if (Ap, p) = O, then p is the minimum point of the convex 
function @ (z) = x (Az, 2). But at the minimum point the necessa- 
ry condition of an extremum 


g’ (p) = Ap = 0 
must be satisfied. | 
(2) If p is the minimum point of the conver function 9 (x) = 
1 ; 
=> (Az, x), then of necessity 


(b, p) = 0. (4.75) 


In fact, if (Ap, p) = O and (0b, p) > O, then f (ap) =a (5, P) -{- 
+ ¢ —-»— oo as a ~ — ov, i.e. f (z) does not attain the minimum 
and this contradicts the assumption. 

The case where (6, p) < OQ is treated in a similar manner. 

(3) The minimum point of function f (x) is not unique. 

Indeed, any minimum point of a convex quadratic function f (z) 
must be a solution of the linear system Az + b = O and conversely, 
since the condition f’ (z) = Az + b = OQ is a necessary and (since 
there is a minimum of f (z)) sufficient condition for an extremum of 
the convex function f (x) (corollary 3.2 of Chap. I). However, the 
rank of matrix A is lower than the number of unknowns (condition 
(Az, x) >O means that matrix A is singular, see (4.74)) and so 
the system Az + 6b = QO has no unique solution. 

(4) If (Ap, p) =0 and z2z€ EE” is an arbitrary point, then of 
necessity 


(f’ (z), p) = 0. (4.76) 
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Indeed, according to conditions (4.74) and (4.75) 
(f’ (2), p) = (Az + b, p) = (Ap, 2) + (b, p) = 0. 


Equality (4.76) can be interpreted as follows. The set of solutions 
of the problem of the minimization of the function @ (x) forms an 
n — q-dimensional hyperplane, where q is the rank of matrix A. 
(This hyperplane belongs to the level surface of function f (z) since 


if p is an arbitrary point of minimum of @ (x) = > (Az, x) then 


using (4.75) we have 
f(p) => (AP, p)+(b, p)-te=e.) 


Consequently, equality (4.76) means that the gradient of function 
f(z) at any point is situated in a g-dimensional subspace which 
is orthogonal to plane Ap = QO. It follows that the number of linearly 
independent vectors f’ (x) is equal to g <n (for a convex quadratic 
function f (z)). 

Having in mind the above properties of function f (x) let us again 
turn to the problem of the application of methods of conjugate 
directions to the solving of the problem under consideration. 

For simplicity, suppose that H, = / and consider method (4.72). 
We denote the subspace which contains vectors f’ (x) by E?. It is 
easy to ascertain that vector p, defined by formula (4.72) belongs 
to E’,. Indeed, p» = —f, € E? and consequently with any k, vector 
P, is a linear combination of vectors which belong to subspace £7. 
Consequently, the function minimization (process (4.47)) by method 
(4.72) is practically performed in subspace £7. Now in this subspace, 
condition (Az, zx) > 0 is satisfied for any x 0. This means that 
owing to space £”? being finite-dimensional we have for any x € E? 


m,||z|? < (Az, zt) CM, ||z|"?, moo, MM. 


Hence function f (x) is strictly convex in the subspace under con- 
sideration, therefore all of the properties of methods of conjugate 
directions discussed in the preceding subsections hold also in our 
case. In particular, equalities (4.58) which show that vectors fj, i= 
= 0, 1, ..., k are linearly independent hold in our case. However, 
there cannot be more than q linearly independent vectors in sub- 
space £%. Consequently, with a certain k < q — 1 the process of 
constructing conjugate vectors must be truncated. Since the method 
is nondegenerate (property (4.18)), this will occur only if f, = 0. 

It is clear from what has been stated above that in minimizing 
a function by method (4.72) at a certain k < gq —1 we shall of 
necessity find that f,—0O. Since directions p; determined by differ- 
ent methods of conjugate directions coincide (to an accuracy of 
a scalar factor), everything stated above holds not only for method 
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(4.72) but also for other algorithms studied in this section. By a 
a somewhat more complicated reasoning it can be shown that the 
result obtained holds also in the case where H,, is an arbitrary strictly 
positive definite matrix. 

Thus, methods of conjugate directions make it possible to find the 
minimum point of a convex quadratic function and the solution of the 
problem is obtained after less than n steps. 

We suppose now that the convex function f (z) does not attain 
the minimum; this will be the case, as can be seen from the proof 
of equality (4.75), if an arbitrary vector p which minimizes function 
@ (x) = (Az, x) is such that (b, p) 4 0. 

Let us consider what the processes of constructing conjugate 
directions lead to in this case. 

We first describe one property of A-orthogonal vectors which 
takes place under the condition that f (x) is a convex function (but 
not strictly convex). 

If matrix A is positive definite, then at least one of the conjugate 
vectors pp, DX k <n —1 satisfies the condition 


(App, Pr) = 9. (4.77) 


That this statement holds is shown as follows. 
If the conditions 


(Ap;, pi) > 0 (4.78) 
were satisfied for any i =O, 1, ..., nm —1, then the system of 
vectors Po, Pi, -- +; Pn-y would be linearly independent. Indeed, 
suppose that conditions (4.78) are satisfied and that 

n—1 

>) 6:pi =0 
i=0 


where, for example, 6,0. Then by (scalar) multiplication of 
both sides of the equality by Ap, we obtain on the left-hand side 
6, (Apo, Po) > O, i.e. the result is a contradiction. Consequently, 
with conditions (4.78) satisfied vectors {p;}}°' would form a basis 
in £” and therefore any vector z could be written in the form 
n—-1 
z= )) aip; 


‘—— 


where at least one of the coefficients a, + 0. However, in this case 
we would have 


(Az, z)=(A pa Qi Pi, > aipi) = py ai(Ap;, pi) >. 


This contradicts condition (Az, x) > 0. 
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Thus the initial assumption that conditions (4.78) are fulfilled 
is false. 

According to the property of A-orthogonal vectors discussed, in 
applying any method of conjugate directions with a certain k > 0 
we shall find equality (4.77) satisfied. Since for the function under 
consideration properties (4.75), (4.76) do not take place, parame- 
ter a,, with (4.77) satisfied, which is calculated by formula (4.16) 
will become infinite, i.e. the further constructing of conjugate 
directions will prove impossible. 


Discussion of Results 


Thus, we have considered a general scheme of constructing me- 
thods of conjugate directions and on its basis oblained many con- 
crete algorithms. Any of these methods makes it possible to find 
the minimum of a convex quadratic function after a number of 
steps in process (4.47) not exceeding n. Besides, we have made it 
clear that successive approximations to the solution 2), 7,, ..., Tr 
obtained by using different algorithms prove to be identical. 

If algorithms are judged by the amount of calculations per itera- 
tion, then algorithms (4.62), (4.63), (4.66) should certainly be pre- 
ferred. These methods are especially easy to implement if the iden- 
tity matrix J is chosen as the initial matrix H, and it seems that 
for most problems the choice H, = / is the most expedient. 

In this case methods (4.63), (4.66) by the work per iteration differ 
but slightly from the gradient method but are considerably fore 
effective owing to the fact that process (4.47) proves to have a finite 
number of steps. 

The advantage of methods (4.62), (4.63) and (4.66) when imple- 
mented on computers is that they require but a slightly larger storage 
capacity of the computer than that required by the methods of 
steepest descent. 

Methods of conjugate directions in which in order to determine 
the direction of motion matrices ((4.48), (4.49), (4.52)-(4.54)) are 
preliminarily constructed are somewhat inferior to methods (4.63), 
(4.66) in the aspects considered; however, they retain thcir consider- 
able advantage over gradient methods. All the methods of the 
class under consideration have advantage over Newton’s method 
in that they do not require the calculation of second derivatives 
of the function. 

The question can be raised whether it is worth while to consider 
methods where we use preliminary construction of matrices if they 
are inferior to methods (4.63), (4.72) in the computational effort 
and computer memory required. 
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However, it should be kept in mind that we are making a purely 
theoretical estimate of the methods and do not take into account 
such an important factor as the sensitiveness of an algorithm to 
errors in computations. This factor can change considerably the 
relation between the amounts of computations involved in solving 
the problem by different algorithms. It should also be noted that 
methods (4.48), (4.49), (4.52), for instance, in solving a minimi- 
zation problem allow to obtain at the same time the inverse matrix 
A! and this may be useful in some cases. 

The difference in properties of algorithms tells considerably when 
they are used for minimization of nonquadratic functions; this will 
be discussed in the next section. 

Methods of conjugate directions prove useful in one more aspect; 
they make it possible to establish whether the sign of the matrix 
is fixed. Thus according to the results of the subsection on p. 99, 
if matrix A is positive definite and function f(z) does not attain the 
minimum, then at a certain step we shall have a, = oo. However, 
if matrix A is not positive definite, we find at a certain step of pro- 
cess (4.47) that a, < 0. Thus the value of parameter a, determines 
the sign of matrix A. 

The effectiveness of methods of conjugate directions is the reason 
of their more and more extensive application to the minimization 
of quadratic functions and solution of systems of linear equations. 


0. METHODS OF CONJUGATE DIRECTIONS. 
MINIMIZATION OF ARBITRARY FUNCTIONS 


Considerations about the Applicability 
of the Methods 


Suppose that we intend to make use of the process 
Lpiy = LR +- AOnPr> Pk =— Arf, k= 0), 1, ° # 89 (5.1) 


where vector p; (or matrix H;) is determined by one of the methods 
studied in the preceding section, for the minimization of an arbitrary 
(not quadratic) convex function f (zx). In this case, matrix f” (z) 
will have different elements at different points of sequence (5.1); 
by virtue of this fact vectors py, ...-., Dx constructed by any of the 
methods described in the subsection on p. 93 will not satisfy condi- 
tions (4.20), i.e. will not be conjugate. Ilowever, if the initial point 
Yo is in a close neighbourhood of the minimum z, of a smooth convex 
function f (x), then at any point of this region matrix f” (x) is close 
enough to matrix f" (z,), i-e. the quadratic function 


9 (a) =z (f" (ty) (@— ay), By) +f (ey) 
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is a good approximation to the function f (x). Thus, we can expect 
that the properties of vectors py, ..., px determined by methods of 
Sec. 4 will be close enough to the properties of conjugate vectors 
(f" (zy,)-orthogonal) and therefore the properties of process (v0.1) in 
which parameter a, is chosen on condition that the minimum of 
function f (x) occurs in the direction of p, will be close enough to 
the properties of methods of conjugate directions. In other words, 
we can expect the methods of the preceding section prove suffi- 
ciently effective in minimizing nonquadratic functions too. In this 
case, the methods will no more yield the result after a finite number 
of steps since the conditions 


(f" (ty) Pr» Pi) = 9, ik 


will not be strictly satisfied with any initial point 7p. 

Iterative processes of the type (5.1) in which vector p, is con- 
structed by algorithms of Sec. 4 and the value of parameter a, is 
chosen on condition that 


f (zx + GPR) = min f (ry, + app) 


will be called as before methods of conjugate directions. 
Note that the condition under which parameter a, is chosen can 
be written also in the following form: 


(fr41, Pr) = (Ff (fe + &epr), Pr) = 9. (0.2) 


The object of this section is to substantiate the convergence of 
methods of conjugate directions in the minimization of nonquadratic 
functions and to obtain bounds on the rate of convergence. 


Theorem on Convergence of the Methods 


In what follows we shall assume that f (x) is a strongly convex 
differentiable function whose first and second derivatives are con- 
tinuous, i.e. that conditions 


milylP<(f (zy, y) <M yllP?, moO (9.3) 


are satisfied for all z, y € £”, and that a symmetric, strictly positive 
definite matrix has been chosen as Hp, i.e. 


moll yll? < (Hoy, y) < Ml y||", mM > 0 (9.4) 


for all y € E”. 

Processes of type (5.1) can be realized either with restoration of 
matrix H, after a finite number of steps, or without such a reinitial- 
ization. Speaking of processes with restoration, say, after m steps we 


mean that with any § = 0,1, ... matrix H;, is restored, 1.e. 
AA en, — H,. 
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From the beginning note the following fact. If a process with 
restoration of matrix Hf; after a finite number of steps is being rea- 
lized, then for any of the methods of conjugate directions the follow- 
ing condition is fulfilled: 


lim {I (a)Il = 0 (9.9) 


since each first step of the process after restoration is a step of gra- 
dient descent, for which according to (5.3) the conditions of conver- 
gence of gradient methods (theorem 1.6) are satisfied and in the 
following steps between restorations we have a descent to the mini- 
mum of the function in the direction of motion. The fulfillment of 
condition (5.5) for a strictly convex function means that any of the 
methods discussed in Sec. 4, if realized with restoration of matrix H, 
after a finite number of steps, converges to the solution x,. Therefore 
in order to judge the effectiveness of such a process it is important 
to obtain bounds on its rate of convergence. 

Note that condition (5.5) for processes with restoration is satisfied 
not only for strictly convex functions but for any function if the 
fulfillment of (5.5) is guaranteed for it in applying gradient methods 
(see theorem 1.4). 

However, if processes are realized without restoration of Hz, 
then their convergence must be substantiated. Besides, it is also 
necessary to estimate their rate of convergence. 

Let us now formulate the theorem whose contents are the main 
result of this section. 

Theorem 9.1. For the minimization of function f (x) which satisfies 
conditions (5.3) let there be applied process (5.1) in which the construc- 
tion of matrix H,, is performed by one of the methods of Sec. 4 ((4.48), 
(4.49), (4.52)-(4.54), (4.69)-(4.71)) with restoration of H, after n steps. 
If the value of a, is chosen under the condition that the minimum of the 
function be in the direction of p;, then the sequence {x:,} whatever the 
initial point x, chosen converges to the solution at a superlinear rate. 

Let us outline the general scheme of the proof of this theorem. 
Suppose that the hypothesis is not true, i.e. that for the iterative 
processes described the following condition is satisfied with any hk: 


lzp4. — Ty || 2A || zn — Ty || (0.6) 


where A > 0 is a constant. Using inequality (1.12) and the expres- 
sion 


If @) i= f @—F (,) |< || rz —2,| (9.7) 


which hold for a function which satishes condition (5.3), we find that 
condition (5.6) is equivalent to the following one: 


ll fray WL 8 [Ife Il (0.8) 
where 6 > OQ is a constant. 
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Studying the properties of process (5.1) and assuming that condi- 
tion (5.8) is fulfilled we find that independently of what algorithm 
is used for constructing matrix Hf, the following estimates hold: 


Cilfell< Ure ll< fa Il (9.9) 
where C, N are constants independent of k, C > QO and 


(C tn+tis lentj) =O (lle enti | | Pentj ll), 
i~j,0<i,j<n—t. (5.10) 


It will be demonstrated below that if these estimates are fulfilled, 
then sequence (0.1) converges to the solution at a superlinear rate. 
However, this contradicts our initial assumption (5.6) (or (95.8)), 
i.e. condition (5.6) cannot be satisfied for process (5.1). Using this 
fact it will be easy to establish that the theorem holds. 

Thus, the pattern of the proof is the same for all the methods dis- 
cussed, but the validity of (5.9) and (5.10) is established in different 
ways. The proof that these estimates hold for different algorithms 
will be given in the next subsection and here we shall describe that 
part of the proof, which all these methods have in common. 

We shall make first a remark about the notations. In what follows 
for the simplicity we shall often in using vectors and parameters 
Tintin f Ent+is CEntir QHintis Bentis i= Q, 1, a 1 
omit index En, i.e. operate with vectors and parameters r;, fis Gj; 
a;:, B; etc. However, it should be stressed that this is done only to 
simplify the written form; the real index of the corresponding quan- 
tity is &nm -+ i. 

We turn now to the proof of the theorem and assume that estima- 
tes (5.9) and (5.10) are satisfied. Using Lagrange’s formula for opera- 
tors we obtain 


(e:, Tj) = (fiein 73) = (firi, 73) + (fic — fi) Tis Tj). (0.11) 
where, as usual, index ic denotes an intermediate point in the corres- 
ponding segment: 

Tic = Xj, + Or;, OSV <1. 
If || r; || ~0O, then because of the uniform continuity of second 
derivatives of function f (x) on set S = {z: f (x) <f (z»)} we have 


ll fic — fi |] —>O and it follows from (5.11) that, if (5.10) is satisfied, 
estimates 


(firi, 73) = 0 (Ilrall bry I) + 0 Cle: Ul Ura Id, 
hold too. 
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Under conditions (5.3) |le;|| = ||fi¢14 — fil] < AT || r;||, consequent- 
ly |le;|| and ||r,;|| are of the same order of smallness. Taking this into 
account, we have 


(iri, rj) =o(llrill irl), tA, Oxi, jrmn—t. (5.12) 


If estimates (5.12) are fulfilled, then there are vectors 


rj = li + Wi, ‘= 0, 1, oe eg LM — 1, (9.13) 
where ||w;|| = 0 ((|r;||), such that 
(fintiy 73) =0,i Aj, OSijcn—1. (5.14) 


This can be shown as follows. 
Let us normalize vectors r;: 


r; = 
” 1/2 
(Fenris ri) / 


Then (finri, 7:3) = 4 and as —&-» co since process (5.1) is con- 
vergent (with restoration of H;,) and by (5.3) and (5.12) we have 


~ ~ ~ 1 ” n” ” 
(feelin 7H) Sa | r;) + (fin —fi) Ti, 7;)] > 0 
ij, 0<i, j<n—1. 
Therefore if R e, is a matrix whose columns are vectors r; and F 1 = 
= RiifRen, then as § — oo 
Fen a a I. 

Since FAR? fi,R en = I we obtain, denoting Fii Rt, by Q2,, 

QFafnR en = I. (5.15) 


Now, since F;, >J, we have also Fz, J and, consequently, 


Ri, >Qi,, i.e. vector-columns q; of matrix Q:, can be written 
in the form 


gq: =Ti + 0; i = QO, 1,...,n—1 


where ||o,|| 0 as & > oo. Let us write the equalities obtained in 
the following form: 


(fenl is r;)/*q; =r, (fenl iy r;)*/?@;. 
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Because of (5.15), vectors r; and’ r;= (fen?i, T))/*Qi, i = QO, 41,..., 
n — 1 satisfy conditions (5.14). At the same time, vectors 7; satisfy 
conditions (0.13) since by (95.3) 


Ld i 2 one 
oj} gntin ra) Well oar? ral | || > 0 
Trill lial [frill 


Thus it has been shown that (5.14) holds. 


Vectors r; with sufficiently large & are linearly independent. 
Indeed, let there be factors 6;, i =0,1,..., 2 — 1 (of which at least 


n— 1 


two are nonzero) such that >) 6,r; — 0. If 6, 0, then we obtain 


ix 


5o (fEenTos ro) + pa} 5; (feno: r;) — () 
j= 


However, with sufficiently large values of & this equality is not 
satisfied. Indeed, since ||w;|| = o (||r;||) and as & — oo, |lr;|| +9, 
with sufficiently large € according to (3.3), we have 


ma 


(fen o, To) — (eno; ro) + (fenTo, Wo) a QO, 


and at the same time (fénrp, 7;) =0,j =1,...,2—1, because of 
(5.14). Thus we come to a contradiction, i.e. vectorsr;,i=0O,1,..., 
n — 1 are really linearly independent. 

Let z:, be the minimum of the quadratic function 


are 
Pp (Z) = (fen, &— Zen) +> (fen (4 — Zen), Z— Xz)). 


Let us write vector z;,—zZ:, in the form 


n— 


n— Zen = 2 a; Tj. (9.16) 


Since @’ (Zn) = fin + fin (Zen — Tien) = 0, then using (5.16) we 
obtain 
n—1 


Dd Aifenti — — fén- 


—_ 
— 


Hence, taking into account (5.14), it follows that coefficients a; 
can be calculated by the following formulas: 


a, = —ee i=0, 1,..., n—t. 
(fe ri, Tj) 


67 


108 


ARBITRARY FUNCTION MINIMIZATION 


Let us write the numerator on the right-hand side in the fol- 
lowing form: 


(fens r)y=(f—-Ath— °° »— fisit fies, ri) — - 2» (e;, ri) 


(we have taken into account that by (5.2) (fi4,, r;) = 0). Hence, 
having in mind estimates (5.10), it follows that 


1-1 
(fen. Ti) = — (Ei, rit 2 o (|| ri || |e, Il)- (0.17) 


According to conditions (5.8) and (5.9) all of the vectors r,, .. ., 
r,-, are of the same order of smallness (recall that vectors r:,4; are 
practically meant). Since as was mentioned above |le;||< M||r;|l, 
vectors €9, . . .; @n-, are of the same order of smallness. Taking into 
account the above remarks the equalities (5.17) can be written in 
the following form: 


(fens ri) = — (é;, ri) + O (\Ir:{l?), l= 0, 1, eo 8 69 n— 1. 
Further, taking into account (5.13), we find that 
(fenT is ri) = (fenti, Ti1) + (fin@i, Ti) 
—= (ficlis ri)+ (fin — ie) Vis ri) + (fen®i, r;) = (e;, ri)+ 0, (|| rigll?). 
Thus 
a. = Li: ri) +0 (|| ri ||?) 
(ei, ra +0 (I ri IP) ° 
By (0.3), we have 
(€:, 73) = (ficri, Ti) 2m ltrill’, i=0O, 1,...,n—1. (9.18) 
Consequently, as § —> oo (i.e. as ||r;|| —0) 
aj >i, i=O0,1,...,.n—1. (9.19) 
n-1 
Since Letpn — Len = >») r;, We have 
i=0 
nt _ 
Let pn — Zn = (Let pn ~ Len) a (Zp _ Len) = 2 (7; — air;). 
Hence, taking into account (5.13) and (5.19), we obtain 
| Zegtaon — Zen |] = 2) 0 (Il ri [I) 


or using (5.8) and (95.9) 
IZe+mn — Zenll = o (IIfenl)- (5.20) 
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Since Zn — Zen = — (fen) Ufin and taking into account (5.20), 
we have 


Lt+pn — Ltn — (Leta —_ Zen) ++ (z tn X en) 
—_—- — (fen) “fen + Tt] En 
where |! enll = 0 (Ilffnll)- | 
It follows that there is a sequence of matrices Dz, — (f:,)7* 
such that 
Letom — Len = — Dinfin- (5.21) 
(We can take, for instance, that 


Zz —wTL 
D2, —_ (f% —4 En . (E+1)n Ln * 
6 (fen) + (fen: fen) (fen) ) 

Equality (5.21) shows that sequence {z;,}, § =0,1,... con- 
verges at a superlinear rate to the solution; the corresponding 
bounds on the rate of convergence can be obtained just as it was 
done in theorem 3.1 for sequence {z;}. 

Thus assuming that condition (5.6) is satisfied and taking esti- 
mates (9.10) and (5.9) to be satisfied, we have demonstrated that for 
{z:,} inequality 

Zeton — Tl] SAen [lee — Tell (9.22) 


where Az, — 0 as & oo, holds. However, if condition (5.6) holds, 
then inequality (5.22) cannot be satisfied since with (5.6) fulfilled 
we have 


Zgtim — Lyl] 2 Alten — Zell. (9.23) 


Thus we have come to a contradiction. This means that condition 
(5.6) (or (5.23)) cannot be fulfilled for process (5.1). 

The impossibility of fulfilling condition (9.6) with any & means, in 
fact, that for process (5.1) (with reinitialization) inequality (5.23) 
cannot be fulfilled in any subsequence {&,,}, m = 0, 1,.... If there 
were a sequence {£,,} such that 


[Xcz,,4a0n — £y|| 2 A" lee — 2y|l, (5.24) 


then with any €,2 < k<(E,,+1) n for methods with reinitializa- 
tion estimates (0.9), (9.10) would be fulfilled; this will be found in 
studying the properties of such’ processes in the next subsection. 
Therefore, repeating the above argument we would have concluded 
that at iterations which correspond to sequence {&,,} inequality (5.22) 
is fulfilled and this contradicts (5.24). 

Thus, for process (5.1) with restoration of matrix H;, inequality 
(5.24) cannot be fulfilled. It follows that for any constant A > O there 
is a number 7 such that with § > 7 condition (5.22) is fulfilled, i.e. 
sequence {z;,,} converges to the solution at a superlinear rate. 
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Study of Properties 
of Different Algorithms 


We turn now to the proof of the validity of estimates (5.9), (5.10) 
for different methods of conjugate directions with restoration of 
matrix H, after n steps, assuming that inequality (5.6) (or (5.8)) 
is fulfilled. 

The fact that for any of these methods the estimates hold is 
established by induction; it is demonstrated that estimates (5.9), 
(5.10) take place with i=4j, i, 7 =0, 1; and then supposing that these 
estimates take place with O<i, j<t<nw—4d1 we prove that 
they remain valid also with O <i, j<t-+ 1. 

1. Method (4.48). If restoration of matrix(4.48) is performed after 
a finite number of steps, then with any & matrix H; is bounded: 


lA, Il< L, L<o. (5.25) 
We show now how this can be proved. 
By (9.2), 
(Hifay frei) = — (Pr: fati) = 0. 
Therefore 
(nen, €n) = (Ant, fre) + (A afkti, fr4s)- (9.26) 


Since H, is positive definite (Sec. 4), we have (H;ze,, e,) > 
— (pr, fr) = (Dr, Cx). Hence, according to (5.18) 


(H7,ep en) >=, Ilr |?. (9.27) 


Taking into account moreover that ||e, ||<M||r, ||, we obtain 
from (4.48) that 


2 H V2 2 
| Zao ||<|| Ze || + os II rr II rag on Ne IP A* | re |? 


m.|| rp ||? m || re ||? 


Using condition (5.4) one can easily ascertain that a2, <a< co 
and on the strength of this it follows from the recursive inequality 
for ||M,4,|| that estimate (5.25) holds for Hz,4,. On this ground 


we shall prove below by induction that @en4; <a@ < oo with any 
i=i1,..., n—1. Taking this into account we find that (5.25) 
holds. 


Let us prove now that with i = 1 the following relations hold: 
(71, €)) = 9,  (€1, To) = 9 (Ilroll Ilrall), 
Cy WAS tall < Ms INA (0.28) 


where the constants NV,, C, are independent of & and C, > 0. The 
first of these estimates is found as follows: 


(Ty, €o9) = — & (Aaf,, 9) = — ay (fi, Hye). 
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But Hye) =1r,, therefore (r,, e)) = —a,(fi;, TT) =O. Further, 
(C1, To) = (ficlis To) = (Tis Soclo) 
+ (ry, (fie — foeo) = (1s &o) + 9 (Ilrill Ilroll) = o Clrill llroll)- 


Let us now show that the estimates hold for ||r,||. It follows from 
(4.48), taking into account (5.2) and (5.26), that 


o 0 o r H 4 
(Hy f;, hi) = (4 oh: fi) — ghee = (H of ts fi) ) Hohe fa) 
(H ofo, fo) 


Using estimates (1.14), (1.15) and (5.7) we deduce that for a func- 
tion which satisfies (5.3) 


m (14-2) (f(z) —f.)<IIF (2) P< (F(2)—-f.)- (5-29) 
Taking into account estimates (5.4) and (5.29) we have on set 
So={z: f(2)<f (Xo)} 


(Hof. fi) — Moll fill? di(f1—fe) — 1 
(Hofo, fo) ~~ mollfoll? ~~ 42 (fo—fe) ~ de 


where d,, d. are constants independent of €&. By virtue of this, 
(Hifi, f) >A Sail fi. (5.30) 
1+7 
dj\.. 
where ay= my | (1+5*) is independent of &. 
2 


Let us use now inequality (5.30) in order to estimate the value of 
parameter @:,4,;. Since 


2 
fe—fi= (fi, Pr) +> (fiePa Pa) 
and a, is chosen according to condition (95.2), it is clear that 


_ (fis Pa) <a<— (fi, Pa) 


M \{ ps |l* m || pall? * 
Now by (5.30), we have —(f,;, Pra)=(Aafi, f:)2a1||f; ||? and by 
(5.25), 1 Py || = f Aif, ||I<L| fi i taking these estimates into account, 


we have % >t =a>0. At the same time it follows from 
(5.30) that || p,||2>a,|| f; ||. Using this estimate we can easily estab- 
lish that a,< — —@<oo. Thus we find that 

1 


Ny WAll = @L WAM > Ural = a WATAI > aay IAI = Ci IAL 


where constants N,, C, are independent of &, i.e. we have established 
that estimates (5.28) hold. 
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Suppose that the estimates 
(Ti, ej) = 0 (rs |] Ir sll), Lf j; 0 <= L, j <= Tn — 1, (5.34) 
Ci lAll<rii< Ni All, O<igt (9.32) 


where constants V;, C; > O are independent of &, hold. Let us show 
that similar estimates take place also with O<i,j<t+1 


(fttis 77) = (jaa, Ty) + (Cpa +--+ ee, 77), OXF <Tt. (0.33) 


According to condition (5.8) and estimates (5.32) quantities 
Wfea+all, [lfc|] and ||r,;|| with any O <i< Tt are of the same order of 
smallness. Taking this into account ~and using conditions (9.2) 
and (5.31) we find according to (5.33) that 


(fr+1; r 5) — 0 (lfr+al| Ir sll) —= 0 (IIryll*), 0 <j] < T. 
Since (f+, r,) = 0, according to (5.2), we obtain finally 
(ft. 73) =O (Irs), OSF<Rt. (5.34) 


Let us estimate now the quantity (1444/7441, ft+,). Using formula 
(4.48) and taking into account (5.26) we obtain for any O <j <tt: 


? ’ , ’ (7 ’ i, )* 
(HT jtafeyis fr+1) = (FT fei ; fr+1) + 
Jo “J 
(f1 je ;, f )? 1 ’ U , , 
a Hej — (Hd je;, ej) (CH fea, fr+41) (Af jfja4. fj+1) 


+ (A feos, fr+1) (HT fj, fj) —(Ajfi44, fri)? 
— (HA ;fj, feos)? +2 (Aj fj+1, fr+1) (HT sf 5, fr+1)). 


On the right-hand side of this inequality the difference between the 
first and the third terms of the numerator, by Cauchy-Buniakowski’s 
inequality, is nonnegative. Taking into account estimates (5.34), 
(5.27) and (5.25) and that a;, 7 < tis bounded, it is easy to ascertain 
that the ratio of the last two terms of the numerator to the denomi- 
nator is of the order of o (|Irjll |Ifeai|]) = o (Ilfe4,/l?). Hence 
H jf, ’ 4) H f., f; , 
(H jeafcaty feyg) > a oI Fea IP) 
J~j? “J 


Estimates (5.32) imply that there are constants a; independent of 


such that (A,f;, f}) = — (p;, fj) |} a; \lfll?. Making use of this fact 
and of (9.29) we have 


a;\l 5 ll? 
(Hl jaafesi, fee > Tr pe ye Haters fe+1) 
—o (|| fess |?) a; (A yfeets feat) —0 (] fot I?) (9.30) 


where a; > 0 and is independent of & (by (5.32)). 
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It was noted in the preceding subsection that for processes with 
restoration of H; as k — oo we have ||f,|| — 0. Therefore, it follows 
from inequalities (5.35), taking into account that matrix A, is posi- 
tive definite, that if with any § we have (A jft+1, fr43) &> vj |Wfeaull? 
where y; > 0 and is independent of €, then there is a constant 
Vjt1 >> 0 such that with any & we shall have (Ayj4,fc4,, fia) > 
= Vi+1 |Nfr4,||". But in estimating the quantity (A,/t+,, /4,) we find, 
since (fH of/z44, fr4i) = Mo|lfr4ill?, that there is a constant y, such 
that (Ay frsi, feta) S V1 | Wfe4i||? with any &. Taking this into account, 
our argument by induction shows that there is a constant a,4, 
independent of € and such that (Hrisfrii, feta) & Gets Ilfcsill?. We 
establish now just as we did above that 


(faa: Pr+1) 
M || P+ l|? 


Therefore, we have 
N ctallfeeall & Urctall = rill etafetall] 2 Crtillfteaill. (9.36) 
Let us show now that 
Aye; =7T7 +n, VSI Rt (9.37) 


where |lyjll = 9 (IIryll). 
Multiplying both sides of formula (4.48) by e; we obtain 


rs (Ts, e ;) (H sec, e;) Hse; 


Ars4 <- _ (fe44> Px+1) L 
S “mi Pratll? SS maz,, ° 


ML2 < O41 


M gre j = Hees + (Ts, &s) 7 (1 ses, &g) " (9.38) 
If we assume that with a certain s, j+1<s<T, equalities 
He; =1rj +n; take place where ||n;|| = 0 (\|r;||), then using esti- 


mates (9.31), (5.27), (5.25) and taking into account that all of the 
quantities ||r,|| are of the same order of smallness we also have by 
(5.38) that H,4,e; = rj + ny, where |nj|| = 0 ((Iryll). But Hy4,e; = 
=r,, and we establish by induction that equalities (5.37) hold true. 
Taking into account (5.37) we have 
(Tr41, e;) = — Anti (H etifttis ej) — Orta(fr+i, rj + Nj)> 
therefore by (5.34), we find that 
(Tri, €j) = 0 ([Iryll’) + 0 (Mfceall irs), O<7 <t. 


Inequalities (5.8) and (5.36) show that ||r..,|| is of the same order 
of smallness as_ ||f¢4,|| and consequently as |lr;||, O<j<t. It 
follows that 


(Tr4i, €j) = 0 (Ircsall IIrsll) = 0 (ireaall’), OXF Qt. (9.39) 
Taking this into account we establish in a manner analogous to that 
used before with i = 1 that also 


(Crta, 73) = 0 ([lretill), OSJ<Rt. (5.40) 
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The relations (5.36), (5.39) and (5.40) show that estimates (5.31) 
and (0.32) really take place with t + 1 too. 

Thus it has been established that estimates (5.9), (5.10) hold for 
method (4.48) if it is assumed that process (9.1) is realized with resto- 
ration of matrix H, after a finite number of steps. 

The above argument can be repeated word for word if we assume 
that inequality (5.24) (or the corresponding inequality ||fe_+1nl| > 


= On [Ife n Il) is satisfied rather than condition (5.6) (or (5.23)) and 
consider only iterations which correspond to subsequence {&,,}. At 
these iterations estimates (5.9) and (5.10) remain valid. 

The superlinear rate of convergence of the method follows from 
this fact as was shown in the preceding subsection. 

Z. Method (4.49). If matrix H, is restored after a finite number of 
steps, then with any & the matrix has a bound. This follows from 
inequality 

r Hy {| M || rp ||? 
| Hosa <I Ah + + ee. 
With i = 1 


(ATH, fi) = (oft, fr) & mo Ilfill?. 


Making use of these relations and reasoning as in studying method 
(4.48) we establish that estimates (5.28) hold and then assuming that 
estimates (0.31) and (5.32) hold we demonstrate that estimate (5.34) 
holds. 

Further we have that 


ry— Hyei, fe44) 


(Ti, ;) 


T 
(Fea, TH) 
(His ifist, fri) =(H thers fan +) 24 


i=0 


Using estimates (5.18), (5.34) and the fact that H,; has a bound 
and taking into account that all the quantities ||r;||, |le;{|, ||fr+,|| are 
of the same order of smallness we find that 


(Aisi ft415 feta) = My \Ifroil|? + 0 ({lfc+21]?). 


Consequently with sufficiently small values of ||/74,|| (i.e. with 
sufficiently large €) we have 


(Ais afesi, feta) SS Ore Ife ll? 


where a,4, > 0 and is independent of €. This being so, we find that 
a S Ort, SS a>D and Cys, (Wfeaall < I Iresull < Naty fetal. 
Using equalities 


AT 4 4€; = Hye; + (Ts— Hes) 


ee i) i4ti<s<t, 
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taking into account estimates (5.31), (5.18), the fact that A, has 
a bound and reasoning as we did in studying (4.48), we ascertain that 
matrix H,,, satisfies equations (5.37) and consequently estimates 
(5.39) and (5.40) remain valid. 

This completes the proof that our reasoning by induction holds. 

The study of method (4.52) is carried out in a similar way. 

Remark. If at an iteration of the initial stage of the process we 
find that p; = — Aff; = O then it is necessary to restart the pro- 
cess restoring matrix H,. 

3. Method (4.93). The technique of the proofs pertaining to mat- 
rix H,; in this case is just the same as in method (4.49). Note only 
that in this method matrix H,4, satisfies equations 


1,430; = Nj: O<cjxt 


where ||n;|| = o ([Iryl|), rather than conditions (5.37). Though this 
simplifies the obtaining of estimate (05.39). 

The study of method (4.54) is analogous. 

4. Method (4.69). Matrix H; (4.69) determines vector p; (4.63) in 


which 6, is calculated by formula (4.64). 
For this method 
Mé lf, UW? Moll fy Ml An-a Wl fe_s Il 
mo |I Ff), Il? 


By (0.8) with any k we have niet <> 
k 


independent of & Taking this into account we find that with any & 
for the process with restoration, || H; ||<L. 


Let us show now that with any k B,<Bp<1, i.e. that 
1—B,>>1—B=B>0 (5.41) 


where B is independent of &. 
Using (4.65) we find that 


Ba = 


| Hf}, |< Mo+ 


=dy, Where d, is 


(Hof,. fy) 
(Hof,, f,) + (1 —Br- 1) (LHof;,_ ik fh - 1) 
(Hofp_ ys tpg) © 
(Hof, fp) 
Because of (5.4) and (0.8) with any & 
(Hof, _ 4: fr_1) M || th [|* 
(Hofp, fy) ~*~ moll flr 


where y is independent of & 


— << 
—» 


1+ (1—Bp-1) 


Sy< 0 
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Besides, using (4.65) and (4.64) it is easy to establish by induc- 
tion that with any k we have 0< 6, <1. Consequently 
<—__*_, 
Ps 1+ (1 Baa) ¥ 


Hence taking into account that Br, = 0, € = O, 1, ..., it can be 
established that (5.41) holds. It follows Som (4. 65) using (5.41) 


that with any & 
—(pr, fr) & Bry |Ifa ll’, 
1.e. 


Px ll 2 Brno |Ifall. 


Taking into account this inequality and the fact that H; is boun- 
ded as in the methods considered above we establish that 0 <a<= 
<a,<a and C |lfz|| < |Ilra|| < N (lf, ||. Thus it has been proved 
that estimates (5.9) hold. 

Let us show that estimates (5.10) hold. With k = En + 1 we have 


(Pi, €o) = —(Hof; , 0) + ar ta [(Hofs, €0) + (Po. o)]- 

But (Hof;, 9) = (Hof, f,) and because of (4.61) (9, €9) = — (Po, fo)- 
Making use of these equalities we find that (p,, e9) = 

If estimates (5.31) hold, then in the same way as it was done for 
method (4.48) it can be proved that estimates (5.34) hold. 

Further using (4.63) we establish that 


(Petis €j) = (Beta — 1) (Hoftsi, ej) + Beta (Pes @)). 


Let us estimate the quantity (Hoft+,, ¢;). It follows from (4.63) 
that 


, { ~ 
Hof; — 6,1 (Pp; ~ Bj Pj-1)- (9.42) 
j— 
Taking this into account we have 
~ 4 ~ 
Hye; = ——— (Pj+1 — Bj +1 P;) Bt (Pp; — BjP;-1)- 
Bisa — Bj;— f 


Using this expression and taking into account estimates (5.34), (5.41) 
we establish that 


(fe+i» Hej) = 0 ([lPrtill?) = 0 (lIretill’), OSF< Tt. 


Since by (5.31) we have also (p,, ej) = 0 (Irx4,||"), OS j <t, we 
find that 


(Peta; e;) = 0 (\lre+a Il"), Ox<j<t. 
With j=t 


(Pri, x) = (Bers — 1) (Hoft+1, ex) + Bats (Py, €x)- 
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Hence using (4.64) and (4.61), we obtain: 


(Hf oft. 3 - ty) (Pr, I) 


(Putts er) (Holts 4: fe4.4)— (Pr fr) 


Since (AH oft41, fr+1) > 0, we have 


(Pris ex) < (FT oft, fr). 


If quantity (A fi+,, ft) is estimated using expression (5.42) with 
j = tT, then making use of (9.34) and (5.41) we find that 


(Hoftti, fr) = 0 ([lre4all*) 


(it has been taken into account that since (5.9) holds, the quantities 
lir;|] and |Ifi|], ¢ = 0, 1, ..., m— 1 are of the same order of small- 
ness). Thus 


(Dx41) e;) = O (\Irz4all7), 0 < j <= T, 


i.e. for the method under consideration estimates (9.39) hold. Con- 
sequently estimates (5.40) hold too. 

Thus we have established that for method (4.69) assuming that 
condition (9.6) is fulfilled, estimates (5.9) and (5.10) hold too. 

5. Method (4.71). By (4.67) and (5.4) we have for this method 


—(Pry fu) 2 Mo |Ifall’. (5.43) 
Taking this into account we have 
loll Uf, WM Rg We I 
| Halll 4oll+ —— oreo 


Since with any k the ratio ||fell_“ ||f;_,|| on set S, is a bounded 
quantity and ||H,|| << M,, we have 


WAI << My + d ||, -1ll 


where d is a constant. It follows that if the matrix is restored after 
a finite number of steps, then with any A matrix H; has a bound: 
\|\H7,|| < L. Making use of this fact and estimate (5.43) we ascertain 
that with any k, a >a, Sa >O andN |lfa|l > ll reall S C IIfrll. 

Consequently, for method (4.71) estimates (5.9) hold. Let us 
prove that estimates (5.10) hold: 


(AHofpsy: fa4) 


(Dr+is Cn) — — (Hofaai. e;,) — (Pk. f;,) (Dr. €,). 
Hence, taking (4.61) into account, 
(Prtis rn) = — (Hofati, er) + (Hofer, fhoi) = (Hofeti, fr). (5.44) 
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With k = En, (5.44) yields 


(Py, €o) == (Aol, fy) = (hts Po) = 9. 


Using (4.66) we can obtain the following expressions: 


(Prti, Cj) = — (Hofeti, ej) + Bott (Drs €)s 
H fj = — py + Bypja, 
Hye; = — Pyar + P3 — Bytes + Bypj- 


If we assume that estimates (5.31) hold, then arguing as in method 
(4.48) we can prove that estimates (5.34) hold. Note also that 


(Hoth. Sr) Mo (lf; Ul? dy 
(Pa-a-Tp_4) > Moll fp_4 WR OO de * 


Br = —— 


Taking this into account and reasoning as we did in method (4.69) 
we establish that 
(Petia, Cj) = 0 ([Iragi ll’), OSF<at 
(Hoft, frti) = 9 (r+ I") 
hence it follows from (5.44) that 


and, besides, 


(Peta, Cr) = O ([Ir244Il?). 
Thus 


(Petis €j) = O (|lrc4ill*), OcjJat 


and this proves that estimates (5.39) hold. The validity of estima- 
tes (5.40) is established in the same manner as it was done _ for 
method (4.48). Consequently, estimates (5.10) hold for the method 
under consideration. 

In studying method (4.48) we noted that the proof of estimates 
(5.9), (5.10) can be given assuming that condition (5.24) is fulfilled; 
it is only necessary to repeat the above argument for corresponding 
iterations. This remark is applicable to other methods considered 
above and this implies that they converge at a superlinear rate. 


Further Study 
of the Rate of Convergence 


{. Suppose now that matrix f” (2) besides conditions (5.3) satisfies 
Lipschitz condition (2.8). In this case it is possible to obtain a more 
precise bound on the rate of convergence of sequence {z;, }. 
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To make referring more convenient, we shall give the different 
relations which hold if (5.3) does (many of them were often used 
before): 


m || az — 2, || < IIf (z)Il < AT |r — 2, | (9.409) 
d, |f (x) — f (t4)] < II (IP < 4, [If @) — f (ey) (9-46) 


(the constants d, and d, are independent of the choice of point z) 


m ||rxil < lex |] < / |Irall. (5.47) 
Let z, y be arbitrary points 
f(y) <f (2). (9.48) 
Then making use of (5.46) we establish that 
If MISC If @)Il- (9.49) 


Here and further on in this subsection C denotes various constants 
(not equal tozero) which are independent of the choice of points 


x, yE ek". 
If (5.48) is satisfied, we have by (5.45) and (5.49) 
lly — tl] << © [lx — zyll. (5.90) 


2. Suppose that for the iterative processes being studied the fol- 
lowing estimate holds: 


Wentall Wentigall SA Wfensyl] O<ti<jxn—1. (0.01) 


Here and further on, A; will denote different variables tending to 
zero as & — oo. 

In what follows we shall limit ourselves to the study of the proper- 
ties of method (4.48). However, the results obtained (lemma 5.1, 
theorem 5.2) hold also for other algorithms of conjugate directions. 

Lemma 9.1. Let process (5.1) be used for the minimization of the 
twice continuously differentiable function f (x) which also satisfies con- 
ditions (5.3) and (2.8); in this process the construction of matrix H, is 
performed by formula (4.48). Then if inequalities (5.51) hold, estima- 
ies (5.9) also hold and moreover 


| (Tentis €tn+;) | <= C Ir en+ell? 7 entetill 
t=min{i, j}, iX~j, O<i, jx<n—1. (5.52) 


This lemma is proved in the same way as estimates (9.9) and (5.10) 
for method (4.48) in the subsection on p. 111; only the order of small- 
ness of some quantities is determined more precisely. Therefore, 
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we abstain from giving a detailed proof and shall dwell only on the 
changes involved. 
With i=1 


(€1, To) = (Fi, &o) + (i, Ste — foe)To)- 
Taking (2.8) into account, we obtain 
Ite — foell = UNF" (a1 + Ory) — f° (0 + 90rd Il 
<R (lly — Toll + llrall + Uroll) < # (2[] roll + Ul rill. 
By (5.90) and (5.45), we have 
ra <I] ze — 2a AM era — 20 IC | Ae l- (5.53) 


Using (5.53) and (5.49) we establish that | re|I<C HAM C]l fo Il. 
Taking into account also that ||ro{|[2C||/,|| we obtain ||7r,||< 
<C||ro |. Consequently, || fic — foe |<cc lI 7 |. Using this we find that 


| ( Ci, To) I<C || ro |? || v4]. 


Suppose that estimates (5.9) and (5.52) hold with 0<i, jQT< 
<(n—1. Then since (fj1;,7;)==0, we have 


(fests ra) |= [ert -.. Fer) IE lrylPllriall, O<]<c. (5.54) 
Hence using (9.53), 
| (fezas 7s) [SE |] ry Ill 5 Ul ll fiz |. (5.95) 


If (5.54) is satisfied, then || fj || L ea I< e | fr4.4 |]. Making use of 
this inequality we obtain from (95.5 


| (ft445 ri)|< — Ae Irs ll esl 0 <j < tT. 
Since (f14,, 7,) = 0, we finally obtain 
Meta, TA) |< Nellrsll (fesall, Oc j<t. (9.06) 
Using estimates (5.27), (5.56) and (5.25) we obtain also: 


(Hf ;f;, fr44)* (fray: r;)* . mor 
heey Se Ss fri |P, O<jxt, (5.57) 


LA jaa Toa) (Ah. fea) | = £1 Fag WM Fey WAG pa 7a | 
(#1 je j, €;) > m || rj || 


Ae ll fog WPM fe Il 
<x (5.58) 
Il 73 II 


Taking into account that ||r;|| > C ||f;]|,0 <j < t and using (5.53), 
(5.49) we obtain 
il < CWC WGI <C Ij, OST Sic. (5.59) 
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Making use of (5.09) we have 


Ae ll fay PU F344 Il , ty . . 
Sl fess OFT. (5.60) 


Using (5.97), (5.58) and (5.60) we establish in the same way as in 
the subsection on p. 411 that 


(Hetafttis feta) & Orta IIfall’- 


Let us show now that 


Hea e@j =P Uji VSI (5.64) 
where 
rg PU jae Il ~ 
ll ny, x42 I<c >, Tre? Oc jJ<T, | Nr, c+ |[=-0. (5.62) 
v=)+1 y 
Indeed, 


rs (Ts, ej) (@.. H ye ;) Il ,e, - 
(Ts, °€s) 7 (H és, %s) ’ (9.69) 
we have H;,,e; =1,;. Using estimates (9.52), which hold by assump- 


tion with O<s,j <t, (5.25), (5.27), (5.47) and taking into ac- 
count that |a;|< C,O <i<t, we obtain withs —j+1 


Ie jea. A yjese yj) LU jase jas || IL (jaa. ry) 1 jase jot Il Urs ll rye ll 


< 
> [lr jad Il 


AT 44; — A e;+ 


(ff j44€ j44- €js4) (Ege jets € 544) 
Noting also that (rj4,, e;) =O we obtain from (5.63): 
rg PD je l 
Il rjea ll 


Suppose thal with a certain j+1<s<t we have 


s—1 . , 
ry UIP jes ll 
He;--ry+njs. [lnislI<C > Th 
v=J-+1 
Then because of the same conditions used with s= j+1 we have: 


H jys€j = 7 + nj, jo [Ny jen [CE 


s—{1 


es. e;) Hes || C lrg HE rye ll 4 s rg UF UP rye I 
(H ses, és) —_ lI rs II l| rv {I 
V=)+1 


I] rs (vs. e,) II erste ddryeall 


C 
(re) S Trl 
Using these estimates in (9.63) we establish that 
om WIP peal 
Hye 5+0j, 041) lnjsnll<C >) — 


Il rv Il 
V=I+1 
Thus, by induction, we can consider (5.61) to hold. 
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We can prove now that estimates (9.52) hold with i = t + 1. 
Making use of (5.61) we have 


(Tr4a, e;) = —Ar4+) (T+ fttiy €;) 
= — Osta (feta, 77 NG, tt). (964) 
By (5.59) and (5.49), we have 
rel] 2 C Wl All 2 ¢ ([faall Ox<v<ct. (9.60) 
Taking this into account and using (5.62), we obtain | (/7+44, N;, c+1) | -< 
< C |lr;||? || rj4,4||. Using this inequality and also estimates (5.54) 
in (5.64) and taking into account that a,4, < C, we find that 
| (To445 €;) | < C Ilr; \| | rjtills O < j < tT. (0.66) 
Further, 
(Crtas 73) = (Tatas €j) + rtis FO (Get. + Ortirsts) 
—f" (tj + 8; 7;))rj). (8.67) 
By (5.65) and (5.53), 
Wreeall <¢ |lryll OF <Qt. 
Consequently, using (2.8) -we have 
UP (Pet, 7 Ortalati) — fF (ej + Ory) || SAK ([ler41 — 2; || 
+ [Wretall- Ulryll) < € Uf ryll. 
Using this estimate and (9.66) in (5.67) we obtain: 
| (Cc44; rj) | < C | r;\l? Wr jta ll, O < j < Tt. 
Thus one step of induction has been completed, i.e. it has been esta- 
blished that estimates (5.9) and (5.92) hold. The proof ofthe lemma 
is completed. 
Theorem 5.2. Let f (x) be a twice continuously differentiable func- 
tion and matriz f" (x) satisfy conditions (5.3) and (2.8). If f(x) is mini- 


mized by algorithm {(5.1), (4.48)}, then with any sufficiently large & 
the following estimate holds: 


ll Met+on — Ze ILS C |lx enti — Lell |] Zen — Lell- (9.68) 
Proof. By (5.45), estimate (5.68) is equivalent to 
fall < © Ut Ul Ufoll- (9.69) 


Suppose that estimate (5.69) with all sufficiently large € does not hold. 
Then there must exist an infinite subsequence {E,,} such that for 
corresponding points the following inequalities hold: 


Well UFill<Ag, Ufall, Ag, +O asEm 00. (5.70) 
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Without loss of generality, it can be assumed that the subsequence 
{E,,} coincides with the whole sequence € = 0, 1, .... Taking into 
account (5.49) we can ascertain that if (5.70) holds, estimates (5.51) 
hold too. Consequently, if we assume estimates (5.70) to be satisfied, 
then the requirements of theorem 5.2 provide for the fulfilment of 
the conditions of lemma 5.1. Thus, if (5.70) is fulfilled, the estimates 
(9.9) and (9.02) hold. 


Taking this into account we have 
| (fs r3)| = | (@n-a + - ~~ + e341; 75) 
<C |r? llInryul, O<Sjan— 2. 


Now in a way analogous to that used in establishing (5.06) we can 
show that 


I (fro 73) |S Ag Mfr I ral], Ae +O, OS Ffan—l. (9.71) 


Let us demonstrate now that if (5.01) holds, then the system 


To, - + +) Tn-, iS linearly independent. Note first of all that due to 
estimates (9.9), it follows from (5.51) that 
ral] Wrisall << Agr], O<t<jxn—t. (9.72) 


Making use of (5.72) and (5.47), estimates (5.52) can take the fol- 
lowing form 


(ri, es) | <A Ural] Ural] SA ellrell les, = Ag > 9, 
Oc i-~jcn—t. (9.73) 
We denote r; = 1;/ |Ir;|| and let 
n—1 _ n—-1_ _ 
lol= min || d Brill=|] 2B Brill 
>> IB l=4 
i=0 
Then 
1 (@, 5) >| IB; (73, €;)|—| D1 Bi (ri. 3) . (9.74) 
ra) 


Since > [Bl =4. we have 1B; |=>B>0, at least for one index 
5¢0,n—T. Then by (5.18) and (5.47) we have 
|B; (rj, es)| > C Iirsll > C llesll. 
With i=4j making use of (5.73) we have 
[Bi (ris es)| < "Ae | Bil lirill Nesll = Ae | Bil Mlesll, 


' 
8 


he —» 0 as % 
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Using the inequalities obtained in (5.74) we have with sufficiently 
large & | (@, e;)| 2 C |le;||, i-e. 
lp || 2 C. (5.75) 


Hence, it follows that the system of vectors rp, . . ., Tyz-1 is linearly 
independent. Besides, using (5.75) it is easy to establish that the 
following statement holds: if wy, . .., ~,_, is a system biorthogonal 
tory, ---;ln-1, then with sufficiently large € we have 


lrlIhii<c, Oxsicn—tT. (0.76) 


Finally, it can be ascertained that under conditions (9.71) and (5.76) 
the system of vectors f,, To, - - -; Tn-, iS also linearly independent. 
Indeed, suppose that 

n—i 


n—-1 
fa = da vitti = D, (fns Ti) Wi- 


1=U 
Then by (5.71) and (5.76) we obtain 
fall < Cag |lfall- 


Since 4, —+Q, the last inequality cannot be satisfied with sufficient- 
ly great —&. Hence, it follows that the system fj, ro, ---, 7-1 is 
linearly independent. 

Thus having assumed that estimate (5.68) does not hold with 
any & > &, (where &, is a certain sufficiently great number), we have 
proved that a system of nm + 1 vectors fn, To, - - -; Tn-, in space E” 
is linearly independent. However, this is impossible. Thus the ini- 
tial assumption was wrong, i.e. estimate (5.68) holds. 

The theorem is proved. 


Discussion of Results 


Thus we have made it clear that all of the methods studied in Sec. 4 
can be applied for minimizing nonquadratic functions and that the 
convergence of the processes can be guaranteed for a class of functions 
that can be minimized by gradient methods. In the case where meth- 
ods of conjugate directions are used for minimization of strongly 
convex functions, the rate of convergence proves not slower than 
superlinear. 

The rate of convergence of methods of conjugate directions was 
established in a somewhat different manner than that of methods 
of other classes studied in the preceding sections: the sequence 
considered was {z;,} rather than {x,}, i.e. actually we considered 
as one iteration a unified group of n usual iterations of the process 
Leng Lentis » ++. Ltnt+n-1- Speaking generally, the real rate of con- 
vergence of such processes may prove slower than of methods of dual 
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directions (Sec. 3) and the more so than of Newton’s method (Sec. 2) 
(i.e. the decrease of the function value at each iteration | f,1, — fp | 
in methods of the class under consideration may prove less than in 
methods of Secs. 2, 3 and the ratio || 7,4, — 24||_/ || 7, — 7.|| grea- 
ter). Thus, if for instance, in some algorithm we have 


Dht+y — 2p = Dy" fr (9.77) 
and in a method of conjugate directions 
Letiyn — Len = — Denhen 


and D,; =D:,— fr, then this means that n iterations of the meth- 
od of conjugate directions are equivalent, as to their convergence, 
to only one iteration of process (0.77). Nevertheless the rate of con- 
vergence of the methods of the class under consideration is practi- 
cally rather fast and exceeds by far that of gradient methods. 

At the same time, as mentioned in Sec. 4, the methods of conjuga- 
te directions differ but slightly from the gradient methods as to the 
labour per iteration. 

The foregoing makes it possible to conclude that the methods of 
conjugate directions are of the most effective for solving minimiza- 
tion problems. 

In this section we limited ourselves to the study of several con- 
crete algorithms constructed in Sec. 4, though we could have studied 
also the properties of other algorithms that can be constructed accor- 
ding to the general scheme discussed in Sec. 4. However, the techni- 
que of studying other algorithms would not considerably differ from 
that used in Sec. 5. Indeed, the difference in the technique of proving 
theorem 5.4 amounts only to somewhat different ways of investi- 
gating the properties of matrix H,. But in any method of the class 
under consideration, vectors u,; and v, used for constructing H,4, 
can be but various combinations of vectors r; and Hie, (see (4.32)), 
and the algorithms discussed in Secs. 4, 5 were chosen so as to use 
in constructing matrices H, various combinations of these elements. 

Using the results obtained, we now compare the properties of 
different algorithms in the minimization of nonquadratic functions. 

The results of theorem 5.2 (estimate (5.68)) show that the rate 
of convergence of sequence {zx;,} depends considerably on the pro- 
perties of matrix H;,. If, as — + oo, 


Hen — (fin): (5.78) 
then 
II Ztn+1— tx || 0 
| Ten tx | 
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and the rate of convergence increases. This fact is practically of the 
greatest interest for algorithms having the property that in minimi- 
zing a quadratic function we have 


H, = An. (5.79) 


Algorithms (4.48), (4.49), (4.52) belong to methods of this group. 
If in implementing one of such algorithms condition (5.78) is ful- 
filled, then, by the above considerations, it is expedient not to restore 
matrix H,. 

In method (4.70), property (5.79) is not fulfilled; therefore the 
variant of this method without restoration gives no advantage (we 
refer to the rate of convergence) over the variant with restoration. 
The same can be said also of other algorithms that have the property 
that in minimizing a quadratic function we have H, = H, (for 
instance, methods (4.69), (4.71)) or H, is close to H, (just this is 
the case of matrix H,, in method (4.70): its effect on the system of 
linearly independent vectors é@,, . . ., @n_, is the same as that of mat- 
rix H,, except on the vector e,_,). Therefore it is not worth while to 
consider variants of such methods without restoration of matrix Ay. 

However, the rate of convergence of methods (4.70), (4.71) will 
increase if we use, instead of the fixed matrix H,, a sequence of 
positive definite matrices Hz 9 which satisfy the condition 


H ey > (fen). (5.80) 


In cases where the requirements of lemma 5.1 are met, matrices 
Hf, which satisfy condition (5.80) can be constructed using vectors 
Ten) Tintiy © + +9 Tentn—1 Dy the formula 

n— 1 
| rensi ins j 
Herne — oy 


e 
0 (Tensi> €tnsi) 
(— 


In the light of the above considerations, the most effective methods 
of the class of methods of conjugate directions from the viewpoint of 
the rate of convergence in the minimization of strictly convex func- 
tions should be methods that have property (9.79). 

In practice, deviations from this conclusion can of course occur in 
the sense that, for example, using method (4.70) the solution of the 
minimization problem (to a given accuracy) can be obtained after 
a smaller number of iterations, as compared to method (4.48), say. 
The fact, as we stressed many times, is that the rate of convergence 
of any method is affected by many additional factors, for instance, 
by errors in calculations, the choice of the values of a, being made 
not precisely enough, and the sensitiveness of the methods to per- 
turbations is different. Besides, the comparison of the rates of con- 
vergence makes sense only in a sufficiently small region about the 
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minimum, and at a point distant from it one can compare the 
effectiveness of different algorithms only on the ground of numerical 
experiments. 

Many works published up to the present time (J.D. Pearson, 
J. Greenstadt, B.T. Polyak [2], H.G. Huang and A.V. Levy) 
contain results of numerical solution of various problems by meth- 
ods of conjugate directions. The most comprehensive comparative 
analysis of the effectiveness of different algorithms is given in the 
last of the works named. On the whole, the results of numerical 
experiments confirm the conclusion that the most effective methods 
are those for which condition (5.79) is fulfilled. At the same time, 
method (4.71) proves more effective in the case where the matrix 
is restored after n iterations (as compared with a process without 
restoration). It seems that in practice method (4.70) should also 
be used with restoration of matrix H,. 

Finally, we dwell on problems that are involved in the choice of 
the step length in methods of the class under consideration. As was 
already mentioned, in methods of conjugate directions the step 
length is chosen under the condition that the minimum of the func- 
tion is in the direction of motion. It was stressed many times that 
the main shortcoming of such a procedure is the necessity of perfor- 
ming a considerable amount of calculations of function values, this 
making the computational effort very considerable in problems in 
which the function evaluation requires much time. In some cases 
the selected method of choosing the step length is not practically 
suited if, for instance, the value of parameter a, changes greatly at 
every step. This shortcoming of the method of conjugate directions 
was stressed in many works (C.G. Broyden [2], B.N. Pshenichny [3], 
W.C. Davidon [2], M.J.D. Powell [1], R. Fletcher [4], and others). 
To avoid the above shortcoming, these works consider methods in 
which the a, value is chosen so that it guarantees only a certain 
degree of decrease of the function. However in other respects, the 
construction of these methods is based on the same ideas, whicl: were 
described above (the work of B.N. Pshenichny [3] excepted). 

The study of the properties of methods in which the choice of the 
step length is not connected with the finding of the function minimum 
‘along the direction of motion becomes much more difficult and the 
theoretical substantiation of many of them has not been performed 
even in the case of the minimization of a quadratic function. 

From the viewpoint of the method of choosing the a, value, meth- 
ods of dual directions, Sec. 3, are preferable. The rate of conver- 
gence of these methods also proves faster. However, methods of dual 
directions require a greater storage capacity of the computer (as no- 
ted in Sec. 3, two nm X n matrices must be stored); therefore using 
them, one can solve minimization problems but of smaller size. 
One can, though, use a smaller storage capacity of the computer by 
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choosing in methods of dual directions vectors r, along the coordi- 
nate axes; however, in this case it is necessary to calculate the deri- 


vative twice at every iteration and this increases the amount of 
work required. 


6. METHODS WITHOUT 
CALCULATING DERIVATIVES 


Introductory Remarks 


Until now we described minimization methods in which it was 
necessary at each iteration to calculate, besides the function f (z), 
its gradient f’ (x) (methods of Secs. 1, 3, 4, 5), and in Newton’s 
method (Sec. 2), moreover, the matrix of second derivatives f” (z). 
Many times we stressed the fact that the calculation of second deri- 
vatives is often the most complicated and laborious part of the 
construction of the iterative process, and methods of Secs. 3-5 were 
worked out just with the aim of avoiding the calculation of second 
derivatives. However in many problems, the calculation of the gra- 
dient can also prove considerably more complicated than the evalu- 
ation of the function (in some cases it is impossible to obtain an 
analytical expression of f’ (x) at all). In such cases it is desirable to 
use methods which require only the function evaluation. 

The calculation of a gradient by an analytical formula can be 
substituted by an approximate one, for instance, by using the 
finite differences approximation to partial derivatives. In this way 
one can construct modifications of the methods (discussed in the 
preceding sections) which involve only function evaluation. If we 
require a definite degree of accuracy of the approximation and impose 
certain additional requirements on the construction of an iterative 
process, then in most cases we can obtain that the properties of such 
modified methods (convergence, rate of convergence) approximate 
the properties of the original algorithms in which f’ (zx), f” (z) are 
evaluated by analytical expressions. 

The study of methods without calculation of gradients is interesting 
also in another respect. In determining the accuracy of approximating 
the derivatives with which the properties of such algorithms coincide 
with those of the original methods, we find in fact the allowable 
calculation errors that do not lead to violations of the properties 
of algorithms (with the calculation of f’ (x), f” (z)). 

In this section we are studying only those algorithms whose con- 
struction is based on methods of dual directions, Sec. 3; in this 
connection we retain for them the same name. Besides, we shall 
dwell on algorithms of another type in which the idea of the construc- 
tion of conjugate directions is realized but without the calculation of 
the gradient or its approximation by finite differences. 
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Constructing Methods of Dual Directions 


In these methods, successive approximations to the solution are 
constructed by the formula 


Thty = Lp — OpDi' er (6.1) 


where D, is an n X n matrix, g, is a vector. The scalar factor a, 
which determines the step length, as distinct from the methods discus- 
sed before, can take positive as well as negative values; this depends 
on which direction —D;'g, or D;'g, is the direction of descent of 
function f (z). 

One can also use another approach and assume that a, > 0, but 


then the direction of motion should be either p, = — Di'g, or pp = 
= Dj1g, so that the following condition holds: 
(fk Pr) <0 (6.2) 


We assume as we did in Sec. 3 that f (z) is a twice continuously 
differentiable strongly convex function. 
Constructing matrix D, and vector g,. Let us determine vectors 


6, and Q;: 
9, — ( f (zp+ Mavs) —f (tr) f (tk-+ Pan) —f (zr) 
hk UL — 9 e8® e «9 Uk 9 
_ ( f (Yr +Bnvy) —f (Yr) (Yr MRUn) —f (yr) 
Pr 1 
De = Ge — 0, 


where 0 < |p, |< |I rp |’, t>> 1, yx, Ty are elements of sequence 
(3.5), v; is the unity vector of the corresponding axis. 

Lemma 6.1. Let {z;,} be a bounded sequence, || x,41 — x, || > 0 as 
k — oo and matriz D, with any k >n—1 be defined by the fol- 
lowing system of equations: ; 


DpTR-i = Wh-is i= Q, 1, ee eg LL — 1 (6.3) 
where r;_; are elements of sequence (3.9). Then 
aim || Da — fa || = 0. 


The proof of this lemma coincides in its essential features with 
that of lemma 3.1. We shall consider only the arising differences. 
The components of vectors 0, and ¢, can be written thus: 


oH = ’ 0<9;<1, 
. 0 
?;, = ’ 0< F< 1, 
OX X=Y ATE jp? ; 
j=i1,..., 7. 
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Taking into account this and the continuity of second derivatives 


of the function, it is easy to ascertain that the following estimates 
hold: 


| 9x — fi (rx) I] << Cy | ma | nl? < Cs |Irz I, (6.4) 
lon —f’ (Ya) I< Cs Ilbe le? < C, \I rail’ (6.9) 


where C,, C, << o. 
Let us write vector ip,_; in the form 


Wei = f (Yn-i) — fF (@n-i) + (On-i — FP (Yr-i)) — (On-i — Ff (2n-i)), 
then denoting as before e,-; = f' (yx-i) — f (tR-i) we obtain 


Dyrpn-i = @n-i + (Pr-i — Ff (Yn-i)) — (On-i — fF (Tn-i)), 
i=OQO,1,..., n—1. (6.6) 


Let us take B,; = D, — f” (z,). Proceeding as in proving lemma 3.1 
we obtain the following estimate: 


|| Barn -i l< hy-i||rr-ill + |On-i —f (zp -i)|| + Pri —f (yn-i)|l 
where h,-; —QO as k -—»> 00. Whence, due to (6.4) and (6.5), we have 


Barn—ill < hei llre-all + Csll Ta-i If, Cs <0 
or 


Barn-ill < Arx-ill rr-il 


where hr: = hp-; + C. llr. —<|{*-2 —+>QO as k > oo. 
The remaining part of the proof repeats the argument of lemma 3.1. 
Let us now determine vector g;: 


( Licacr pnts) 7 er) f (th +PrUn)—f (rR) } 


bk = Pk mee" Pr 


(6.7) 
where |p, |< | Be | (if Px = pe, then gp = O;). 

It is clear that the convergence and the rate of convergence of 
sequence (6.1) depend not only on the value of matrix D, but also on 
how close vector g, approximates gradient f’ (z,). It will become 
clear from what follows that in order to guarantee a fast rate of 
convergence of sequence (6.1) to the solution, it is required that with 
any k the inequalities 


O< |p, |< &n Il De I (6.8) 


where &, —0O in an arbitrary manner as k -> ov, be satisfied. 

If at a certain iteration the chosen value of 0; does not satisfy 
conditions (6.8), it becomes necessary to take a smaller p,, calculate 
a new vector g;, and then calculate a new vector p, and check up 
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once more whether (6.8) is satisfied or not. Since g, — f, as | px | > 0 
and at the same time || p;|| — || Da'f,||, and || Dz*fr|| => O with any 
XL, ~~ Xy (matrix D;' is nonsingular being the inverse of matrix D,; 
on the calculation of D;' see the subsection on p. 111, calculation of 
vector p;), then with sufficiently small values of 0; conditions (6.8) 
are satisfied. 

Determining the direction of motion. This is made as follows. 
Setting a certain value of y, (naturally this should be chosen suffici- 
ently small), f (x) is evaluated at points x, + yoDx'g,. Lf at one of 
these points the functon value is less than f (x;), then the corres- 
ponding vector (—D;'g, or Dz‘ g;,) is taken as p; (condition (6.2) is 
satisfied since f (x) is convex). However, if both function values are 
greater than f (z,), we reduce y, until one of the function values be- 
comes less than f (z,), and the corresponding vector is taken as p,. 

However, it can occur that with small values of y the function 
does not decrease in either of the directions +D;,'g,. This can mean 
that either we have not reached values of y with which the function 
decreases or the condition (/’,;, Dz'g;) = 0 is satisfied (it will be seen 
from what follows that such a case is possible only at the initial 
stage of the process and then, obviously, neither of the vectors 
+D;,1g, can be chosen, as p;). In order to exclude such an occurrence 
it is necessary to calculate a new vector g;,, having changed p, (but 
so that conditions (6.8) be satisfied), calculate a new vector Dz‘ g,,, 
and from a certain y< y, on, evaluate the function at points 
tp + yD gp, 1 as well. Ifz, ~ x, then one of the directions + D;'g; 
or +D;' 2p. 1, is, of necessity, the direction of descent. The correspon- 
ding vector is then taken as p,. 

The algorithm of choosing the step. Let us choose a, in the fol- 
lowing way: suppose that 


a, =mi Ng» Pr 
a, =min { 4, R AT (6.9) 


where 0 < R < oo and check the validity of the inequality 
f (x) — f (tr) < ea*Ba (Bn, Pr) (6.10) 


{ 
where z = x, + app, Be = — sgn (rn, Pr), OME <Z.- 


If (6.10) holds with @ = a,, then the value a, is taken as the re- 


quired one, and if not, we reduce a, until (6.10) is satisfied; the value 
of a, thus obtained is taken to be the one sought. 

This method of choosing a, presupposes, of course, that (g,, Dr) ~ 
+ Q. If at a certain iteration we find that (g,, px) = O (this can occur 
ohly at the initial stage of the process), then it is necessary to reduce 
px and calculate vector g, anew. 
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We now study the properties of sequence (6.1) in constructing 
matrix D,, vector g, and parameter a, by the method described 
above. 

Theorem 6.1. /f f(z) is a twice continuously differentiable func- 
tion that satisfies conditions (2.4), matriz D, with any k >n — 1 is 
defined by system (6.3), vector g, is determined by expression (6.7) 
where 0, satisfies conditions (6.8) and a, determined by the method de- 
scribed above, then for sequence (6.1) statements analogous to those 
proved in theorem 3.1 hold. 

Proof. In order to take advantage of lemma 6.1, it is necessary 
first of all to show that under the conditions of the theorem, condition 
llzn+1 — Zp|] ~ 0 holds for sequence (6.1). 

Expanding function f (x) into Taylor’s series to the second-order 
terms in the region about point z, we obtain: 


(fh> Pr) Op (frePh» Pr) 
fuss— a= @aBn (Bs Pa) | Bega pay +2 Ba le Pa) 


where xy, = Z_ + O (La4y — Zp), 99 <1. Since By (gn, Pa) <0, 
inequality (6.10) is satisfied if 


(f),> Ph) an SkePh> Pr) 
Br (8k, Ph) +> Br (gr: Ph) ~~ 7 OnE 
or which is the same thing 
1 (fR» Dr) 1 (fkePk» Ph) 


Gn Br (8x, Pa) | 2 Br (er, Pr) Fe. (6.41) 
Due to (6.2) and the choice of 6, 


(fs Ph) 


Br (Zk, Ph) => 0. 


Consequently, with a certain a, > O the inequality (6.11) is satisfied 
and, therefore, (6.10) as well. This proves the possibility of choosing 
a, by the method described above. 

Thus, by (6.10), fri, <f,. This means that z, € S = {zx: f (4) < 
< f (x9)} with any & and since f (xz) has a lower bound f, — f,4, — 
— (0. Hence, it follows from (6.10) that as k — oo 


ai | (Zn. Pr) | +0. (6.12) 
Since a, < ax, it follows from (6.9) that 
|(Zr» Pr) > ~ I pr ||. 


With account taken of the last inequality, condition (6.12) implies 
that ||z7,4, — Zp || = apy || pall -—O as k— oo. Consequently, the 
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conditions of the theorem provide for the satisfaction of the require- 
ments of lemma 6.1 and therefore 

ID, — fall — 0. (6.13) 


We show now that under the conditions of the theorem as k —> co 


(fis Ph) , 
——_—_—___—_ . 14 
Br (gr Ph) 4 (6.14) 
We have 
(f3'> Pk) 4 ({,— hs Ph) ] ll fp —8k II Il Pr || fae 
sy OS er Oe 6.15 
Br (x: Pr) Br + Br (gr, Pk) S Br | (gr: Pk) | a 


For vector g, the following estimate (analogous to (6.4)) holds: 


lgx — fell < Col Px |m/? = Cz py |. (6.16) 
Since 


1 (Zn, Pr) | = |(PaPrs Pr)| (6.17) 


it follows because of conditions (6.13) and (2.4), that from a certain 
iteration on, 


l(a» Pr) | 2 m, \|Ipall? (6.18) 


where 0 <m, <m. Using estimates (6.8), (6.16) and (6.18), we 
find that from a certain iteration on, the following conditions will 
be fulfilled: 


Il fi, --8h II II Pe Il 
a— “Srl Pall — Color i Pell — 


< {7 £, —+ (0. 
[\(ge. Pr)| Tg | Pr II 


Hence, it follows from (6.15) that from a certain iteration on, we 
have B, = + 1 (since the left-hand side of (6.15) is positive) and 
therefore condition (6.14) is really satisfied. 

On set S the gradient f’ (x) is bounded: || f, || < 2. Taking into 
account also that |p, | <p < oo, we find using (6. 3.16) that || g, || < 
< L, with any k. By analogy with theorem 3.1 we can establish 
that with any k >n—1 we have || D;' || < M,.. Consequently, 


ll Pe Il < Il Dat ll ll gall < Cs. 


Using this estimate and inequality (6.18) we establish that with 
sufficiently great k 


| (gps Pr) | ms {I Pr I? Co >0. 6.19 
ll Pr {I> 2 | Pr Is > TT 17 1 , 


Hence, it follows from (6.9) that from a certain & on, we shall have 


a, = a>. (6.20) 
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It can be also easily ascertained using conditions (6.14) and 
(6.18) that with sufficiently great / inequality (6.11) and, therefore, 
(6.10) too are satisfied with values a >a > 0. These inequalities 
together with estimate (6.20) show that from a certain k on, we 
find that 


Op = C10 > 0. 


Because of this estimate it follows from the condition @,|| pz || 0, 
the fulfilment of which was discussed above, that as k — oo 


Il Pr || 0. (6.21) 
Since || gp || =|| Dave || < M4|| pe ||, provided (6.21) is satisfied, 
we have 

ll Zn || +90. (6.22) 


In accordance with conditions (6.8), (6.16), (6.21) and (6.22), we 
can assert that as k — oo 


Il f° (ep) || 0. 


But this means, due to inequality (1.12) which holds for strongly 
convex functions, that sequence (6.1) converges to the solution. 
Let us obtain an estimate of the rate of convergence of the method. 
Due to conditions (6.17), (6.21) and the uniform convergence of 
second derivatives of the function on set S as k oo, we have 


(fnePk» Ph) 


| (gr, Dr) | > 1. 


Using this condition and (6.14) it is easy to ascertain that inequali- 
ty (6.11) and, therefore, (6.10) as well with sufficiently great & are 
satisfied with a = 1. From relations (6.19), with (6.21) fulfilled, 
it follows that 


| (Zh, Pk) | 
| pr I8 


Therefore in choosing a, according to condition (6.9) from a certaink 
on, we have a, = 1. 

The above remarks show that from a certain iteration on, a, = 1 
and 


__ —1 
Lriy — Lp = —Dri' Bp. 


At the same time there is a matrix Dz such that 


_. __Jjr1f! 
Cpt. — Fp = —Di* fr. 
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The sequence of matrices D, under conditions (6.8) and (6.16) can be 
chosen so that 


D, —+ D,. (6.23) 
In order to obtain (6.23) we can, for instance, assume that 
(7, — gk) 
ll zn44— Zk II? 


D, =D,— (Thiy—Zn)*. 

It is now easy to prove that sequence (6.1) converges at a super- 
linear rate. We proceed as in theorem 2.1 and establish that the 
following inequality holds: 


ll Sata — Le |] SI] De? Ul |] Da — fhe Il ll tx — Ze Il. 


Further using conditions (6.13), (6.23) and the continuity of second 
derivatives we ascertain that as k oo || Dz — Tre || ~O and 
quantity || Dz‘ || has a bound. Hence, as k —» oo we have 


ll Tray — Tall S Anil Tr — Tell (6.24) 


where A; —0 and this proves that the rate of convergence of {z,} 
is superlinear. 
The theorem is proved. 


Remarks on the Implementation of Methods 
of Dual Directions | 


Various algorithms. The requirements which should be met by 
vectors r;, used in constructing matrix D, are the same as those 
considered in constructing sequence (3.0). Therefore, all that was 
said in the subsection on p. 74 about the construction of various 
algorithms of type (3.4) holds for process (6.1). 

Calculation of vector p,. The results of the subsection on p. 76 


are fully applicable here. Thus, basis spii, Sky -- +> Sk-nt+o, the 
dual of basis p41, Up, - - -> Va-n+o, iS constructed by the following 
formulas (analogous to (3.21)): 


Sh-n+1 


Sk+1-j = Sk4a—-j — \Shkt1-j. €k4+1) Sk41 
(Sk-n44> Cha4) ’ ] 1-Jj ( t J) ) ) 


Sk+1 = 
j=1,...,n—1. 


In this case in order to check that vectors Wri, Wa, -- -> Wren+e 
are linearly independent, it suffices to calculate the scalar product 
(Sp—ntis Pati); if (Spntis Pati) ~ O, then vectors p41, Wp, - ~~, VR-nte 
are linearly independent. But if we find that (s,_,41, Vp4i) = O, then 
it is necessary to change either vector r;,, or one of the vectors 0,.,, 
(p+,, thus changing vector t,+44. 
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In practice, successive approximations should be constructed by 
the following formula (analogous to (3.25)) 


n— i 


Lh+1 =k — Ap a, (Spi, Ba) Th-i- (6.20) 


y 


The initia] stage of the process. There are several ways of perform- 
ing the first iterations of the process (with k < n — 1). For instance, 
the descent can be realized in one of the directions B,g,, By = +1 
choosing the sign of B, so that f (x) decreases. 

In order to ensure uniformity of the iterative process (6.25), it 
can be started in a way analogous to that given in the subsection 
on p. 79. 


Minimizing a quadratic form. Let f (x) = = (Az, z) + (b, zx) +c, 


where (Az, x) >O with any z $ 0. In this case it is easily ascertained 
that vector 0, = g, =f (tx), Pa = f (Yr), Pa = Cn, ie. Dy = Ay, 
and process (6.1) coincides with (3.4). Consequently (see the subsec- 
tion on p. 79), process (6.1) allows to find the minimum of a quadrat- 
ic function after nm steps. It is necessary in this case to calculate 
(x + 1)* function values. 

Choosing vector g,. In method (6.1), besides approximating ma- 
trix f” (x), we also substitute for gradient f’ (z) its finite differences 
analogue-vector g;. In this case as was noted above in order to obtain 
a superlinear rate of convergence, conditions (6.8) are to be satisfied. 
If to satisy these conditions we have to calculate at a certain itera- 
tion vector g, several times, the amount of work required in the 
process increases (particularly, for a multidimensional space). 

Note that if || px || <|| pR-,|| at each iteration, then one can choose 
|p, | =|] Da-y||?. It is very probable that with such a manner of 
choosing p,, the right-hand one of the inequalities (6.8) will be 
satisfied, at least from a certain iteration on. Indeed, in the end we 
obtain bounds (6.24) on the rate of convergence. 

The rate of convergence of a process estimated in this mode is 
usually slower than the quadratic one: 


| Za4a — Tall <|l 7, — Zp -4|l*, l| 7, — Tp-y|| > 9, 


le. with bounds (6.24) usually ll Pa ill? <l pal] (recall that from 
a certain k on, we have a, = 1, i.e. pp = 124, — Xp). Therefore, if 
sequence {&,} is chosen such that €, > 0 at a sufficiently slow rate, 
we can expect that with po, =|| p,-,||* it will not be necessary to 
calculate g, many times in order to satisfy (6.8). If, however, condi- 
tions (6.8) are not satisfied from the beginning (i.e. if we have to 
reduce 9;), this will suggest that the rate of convergence is close to 
the quadratic one. 
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In conclusion, note also that using the results of the subsection on 
p. 74 and of this section it is possible to establish the conditions 
of convergence of the modification of Newton’s method that does not 
require calculation of derivatives. 


Methods of Conjugate Directions 


We consider a method of constructing conjugate directions which 
differs in the essence from the methods discussed in Sec. 4. 
Let again 


f (x) + (Az, r)+(0, x)+c 


where (Az, xz) >0O with any zt ~ Q. Suppose that directions p,, ... 
2+) Dm» m <n (not equal to zero) are A-orthogonal and E”™ (zy) 
and E™ (x»9,m) are two different m-dimensional subspaces of space E” 
that are formed by vectors p,, ..., Pm and pass through points 2 
and 29. m- 

If z,, and z.m are points of the minimum of f (zx) in subspaces 
E™ (x) and E™ (29,7), then 


(f’ (1m), Pi) = Q, (f (Zm.m)s Pi) = Q, 
i=1, 2, ...,m 


Consequently, (f (Im) — f (2m.m)> Pi) = Q or 
(A (41m — 2mm), Pi) =O, t=, 2, ..., m. 


Thus if points of the minimum of / (z) are determined in different 
subspaces formed by A-orthogonal directions p,, ..., Dm, then the 
direction Dmi+1 = Zm.m — Lm proves to be conjugate to directions 
Pir +++: Pm- 

The method described of constructing conjugate vectors does not 
require calculation of the gradient or its finite differences approxima- 
tion. Let us now describe a concrete algorithm for the minimization 
of a quadratic function in which the construction of conjugate 
vectors is performed by the methods described. 

We choose arbitrarily point z, and vector p,; the m-th iteration of 
the algorithm (m= 1,2, ..., m) is performed as follows: 

(1) Calculate point 


Lm = Lm &mPm (6.26) 


where a,, is determined under the condition of the minimum function 
value: 


f (a) = f (2m -1 + 2Dm)- 
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(2) Calculate point 


Lom = lm + Tm (6.27) 
where r,, is an arbitrary vector which is not a linear combination of 
vectors P1, ---,; Dm (below we shall dwell at some length on the 


question of the choice of r,,). 
(3) Calculate points 


Lhem = Lr-1, m + Qrhim Pky k — 1, oe oy MM 


where factor @,, » is determined under the condition of the value 
of function f (a) = f (%p-1.m + @p,) being minimum. 

(4) Now calculate vector Pr+; = 2m.m — Im- This is the end of 
the m-th iteration. 

Vector r,, (in (6.27)) must not belong to the subspace E”™ (x5) 
so that point 2),, would not belong to subspace E”™ (zy). Since 
point z,, is the minimum point of f (x) in subspace E™ (z,), it is 
clear that any vector zx — z,, in whose direction function f (x) de- 
creases does not belong to E™ (x,). Consequently, any direction of 
descent of f (x) from point z,, can be taken asr,,. In particular, it is 
convenient to choose vector r,, along one of the coordinate axes; 
then if such a vector proves not to be the direction of descent, it is 
necessary to take as r,, a vector along another axis. 

According to the results of Sec. 4, point z, calculated by for- 
mula (6.26) is the minimum point of f (x): z, = x,. In order to find 
point z, we have to solve one-dimensional minimization problems 
(to determine factors a, and a,,)1+2+...+72n= a 
times. 

Using this approach to the construction of the method of conjugate 
directions, one can construct various algorithms for the minimiza- 
tion of nonquadratic functions. Of course, in any algorithm of this 
kind, the directions p,, ..., Dm, m<n will be no more conjugate 
(see the subsection on p. 103). However, we can expect that suitably 
worked out methods in a sufficiently small neighbourhood of the 
minimum point zx, (of a convex smooth function) will make possible 
the construction of vectors that are close enough as to their prop- 
erties to the conjugate ones. Such algorithms may prove effective 
in minimizing nonquadratic functions. 


We shall consider below an algorithm based on the above con- 
siderations. 


rn 


Let z,, » be an arbitrary point and y,, 1, --., U;, n be an ortho- 
normalized coordinate basis; the k-th iteration of the algorithm, 
fk = 1, 2, ... consists of the following steps: 


(1) For i= 1, 2, ..., n calculate 


Lh, i = TR, i-a + Opi Vn, 3 
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where @;.; are determined under the condition that the function 
value is minimum: 


f(a) = f (fn,i-a + UVa, ;)- 
(2) Assume that 


Zk,n—Zk,0 


Vai nti = Vk 


where y, =|| Zan — Za.ol| and calculate point 2%,.n414 = Zan + 
+ GantiWanti, Where @,.,4,; is determined under the condition 
that the value of the function 


f (Lrin + AUpn+1) 
be minimum. 
(3) Let az, = max{a,;: i=1, 2, ..., n}, A, be a deter- 


minant whose columns are vectors U;z,1, ---, U_yn and e>O 
be an arbitrary small positive constant. If 


An, sApz 
— Se, 
VR 
we set that vajs,. ; = Vg, ; With is&s and Upj4s, 5 = Vandi; then 
we have 


3 A 
Anu = a (6.28) 


If we find that OnsOh — 9 we take Uyz4,,; = Uni for all i = 


= 1,2, ..., n; then A,4, = Ax. 

(4) Take 2445.9 = 2z.n41; this is the end of the k-th iteration. 

Equality (6.28) must be proved. However, preliminarily we shall 
discuss the algorithm proposed. 

Let us consider a simplified variant of the algorithm whose k-th 
iteration is performed as follows: 

(1) Construct points z,,;, i=1, 2, ..., n in the same way 
as in step (1) of the original algorithm. 

(2) Calculate panty = Zain + Onntianti Where Vzn+1 = 
= XIpn —Xp.p ANd 7,4, provides the minimum of function 
f (Tran + Vz, n+1)- 

(3) Set Vr+i, i = Yr,itis i= 1, 2, oe oy MN. 

(4) Set 241.0 = Tr, nt1- 

Let & = 2. Then we have 


Zo,9 = 4%4,n4+1 = Lin + Ay nti1Y1,n4+1) 


Loin — Lon-1 + Qo nVoa.n — Lon-1 + ho nYisntis 
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i.e. points z,,) and z,,, are minimum points of f (z) in the one- 
dimensional subspace (formed by vector v,, ,+,) which passes through 
two different points 2z,,, and x,,-,. If f (x) is a quadratic function, 
then according to the foregoing the direction V.n41 = Ten — 22,0 
is found to be conjugate to the direction V,, n+; = Ve, n- By similar 
reasoning, it can be ascertained that if with any k = 1, 2, ...,n 
vectors Vp. ;, -.+, Ux. , are linearly independent, then after the 
k-th iteration vectors Upg.nii, Vany «+> Unn—-kt+g Will prove 
conjugate, i.e. after m iterations of the process we shall have cons- 
tructed nm conjugate vectors. However, it is impossible with this 
method of constructing vectors v,1, ..., URn to guarantee their 
linear independence. Indeed, if with a certain k we have a,,, = 0 
then, as is easily ascertained, 


Tl 
Ve, nti = la, n— “hk, 0 = hh, n — Fr = a Op, iVh, is 


i.e. at the (k + 1)-iteration the system) of vectors U;_41,; = Vkz,i+1) 
i=i1, 2, ..., n is found to be linearly dependent. In this case 
it is not possible to construct a system of nm conjugate vectors; this 
means that with the application of this simplified algorithm we 
cannot guarantee that a solution will be obtained even for a quadrat- 
ic function. The more complicated steps (2) and (3) of the original 
algorithm are used just in order to avoid linear dependence of vec- 
tors Vz, ;, i = 1, 2, ..., m (we find that A; > es). 

However, note that in minimizing a quadratic function with 
the aid of the original algorithm, it is impossible to guarantee that 
the problem will be solved after a finite number of iterations. Indeed, 
if we go over from the system of vectors Vz1, ..-, Van to the 
system Vpii, -- +> Untin, it can occur that one of the conjugate 
vectors already constructed is changed (see step (3)); therefore it is 
impossible to guarantee that mn conjugate vectors will be obtained 
after a finite number of iterations. Besides, the system of vectors V,,; 
can remain unchanged in going over to the (k + 1)-iteration. 

We show now that equality (6.28) holds: 


Ania = det (Ursin, - + +> Vatin)! 


= det (ve; © © ey Ve g-1) Urnintir Uk,stir + + +> Ya, n)]. 
But 
Te 
4 1 
Vantt =F (Thin — Tho) = >> Dr Op, iVr,i- 
1=1 
Consequently 
_._ &hk, ay, A 
Aru = — det (ae ooeg Unig, «2 oy Vn.n)]= 
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Thus with any k we find that A; > «, and just this guarantees 
that vectors vz, ---,; Upnmn are linearly independent. 

Let us study certain properties of this algorithm. 

Theorem 6.2. Let f (x) be a continuously differentiable strictly convex 
function such that set S ={x: f (x) <f (2,, 9)} has a bound with an 
arbitrarily chosen point x,,5. Then sequence 


{t,;:}, @#=0, 1,...,”, k&A=1,2,... (6.29) 


constructed by the method described above converges to the minimum 
point of function f (z). 

Proof. The existence and uniqueness of minimum point z, of 
function f (z) under the conditions of the theorem follow from the 
results of lemmas 3.1 and 3.4 (Chap. I). Therefore it remains only 
to prove that sequence {z,, ;} is convergent. Any point of se- 
quence (6.29) x,; € S since f (x,,;) = min f (zz, ;-, + ave, 3) < 


a 
<f (22, i-1) and f (Tr44, o) = f (Zanti) Sf (Ten). Set S is bound- 
ed, i.e. (in EZ”) it is compact. Consequently, on every infinite se- 
quence of elements of this set it is possible to pick out a subsequence 
which converges to a certain element of S. If we consider sequence 
{x,,;} with a fixed i = 0, 1, ..., n, then, by virtue of what has 
been stated, there is an infinite subsequence {Zn ;} that converges 
to a point z; € S. At the same time since f (z) has a lower bound, we 
have f (2n_.i+1) >f (tn,,.i). It follows, taking into account the 


continuity of f (xz), that the following equalities hold: 


f (%i+1:) = lim f (Zp it) = lim f (£n_,.i) = f (zi). (6.30) 

Ro ky,7© 
Let us demonstrate that z;4, = z; for alli = 0, 1, ..., n — 1. 
By construction, || v,.;|| = 1 with any & and i, consequently, 
vectors v;. ; can be considered to be elements of a unity sphere (of 
a bounded set) and therefore with any fixed i = 1, ..., nm there is 


a subsequence {v; i} that converges to a certain vector v;. Since 
Triti = Tai Spite, itr and TR ity > Zita, Lay. > Ziv 
Vz .it1 —> i+; We have 

Lit, = Tj + i+ 1Vi+1> L = Q, 1, ee ee 1 
Where @;4, = lim a, , i4;. Since condition f (%,,;+1) = 


km-—oo 


= min f (rz, ; + Vz, :4,) is satisfied'at point z,, ;+,, we must have: 
0 2 
f (2:41) = min f (x; + OVi+1); i= Q, 1, ce ey LM — 1, (6.31) 
a 


i.e. the minimum of f(z) in the direction v;+, is attained at point 
Li4,- But it follows from (6.30) that f (z;4,) = f (z;). Since f (x) is 
strictly convex, there is a unique minimum point in the direc- 
tion V;1,; hence x31, = 7;. 
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Thus we find that z,) =z, =...=2,. Denoting this common 
point by x we can rewrite condition (6.31) as follows 


f(@z)<f@t+av), i=1, 2,...,7 (6.32) 


with any a. For a differentiable function these conditions are equiva- 
lent to the following ones: 


(f’ (x), v;) = 0, i= 1, 2,...,n. (6.33) 
Note now that since det [(vz, 1, -- +, Up, n)] = &, we have also 
det [(v,, ..., Un)] Se. It follows that vectors v,, ..., Vv, are 


linearly independent. Taking this into account we have from (6.33) 
that f’ (z) = 0. Due to the strict convexity of f (x), this means that 
z is the minimum point of f (x): x = 2. 
Thus we have proved that there is a subsequence {z;, ;} which 
converges to point x,. However, since with any fixed i =O, 1,... 
., n we have f (x,4,, ;) <f (zx. ;) and f (x) has a lower bound, 
the following condition is satisfied: 


lim f (zn, 1)= lim f (thq.t) =F (21) = f (2). 


It follows that with a fixed i the sequence {z,, ;} is a minimizing 
one, consequently the sequence (6.29) is a minimizing one as well, 
and therefore, since there is only one minimum, the sequence con- 
verges to point z,. The theorem is proved. 

It is easy to ascertain that in proving that conditions (6.32) hold, 
we made no use of the fact that function f (xz) is differentiable, i.e. 
these inequalities hold also for a strictly convex continuous: func- 


tion. However, point z—the limit point of sequence (6.29)—in this 
case can be not the minimum point of f (x) (at the same time se- 
quence (6.29) can have more than one limit point). 


Discussion of Results 


Note first of all that the field of application of the method of 
conjugate directions is broader than that of methods of dual direc- 
tions; this is easily ascertained by comparing the requirements 
imposed on the function being minimized in theorems 6.1 and 6.2. 

The properties of the method of conjugate directions under con- 
sideration have been as yet studied but insufficiently. Thus it is 
not yet clear what the rate of convergence of the algorithm is. 
Nevertheless, it is evidently slower (in minimizing functions of the 
same class) than that of methods of Sec. 5; this can be judged even 
by the fact that the algorithm under consideration does not gua- 
rantee the finding of the minimum of a quadratic form after n itera- 
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tions (and in fact, after a finite number of steps), i.e. it does not 
guarantee the construction of a system of m conjugate vectors after 
a finite number of iterations. Consequently, from the viewpoint 
of the rate of convergence, methods of dual directions with their 
superlinear rate of convergence have an advantage.over the methods 
of conjugate directions. 

Let us make an attempt to compare the amounts of computations 
at iterations of the algorithms studied. 

In a method of type (6.1), it is necessary at each iteration to cal- 
culate the function value n + 1 or 2(m +1) times for the construc- 
tion of matrix D;' and n + 1 times for the construction of vector g;, 
depending on the variant applied (see the subsection on p. 136); 
at the same time it can occur at some iterations that in determining 
g;, there is no need to perform new evaluations of the function (if 
O,n = Lz), or the amount of calculations can increase several times 
depending on how close the gradient is approximated by g;. Besides, 
it is necessary to perform some more calculations of function values 
in order to choose the direction of motion and the step size. 

In the method of conjugate directions it is necessary to calculate 
at each iteration the minimum of the function in the direction of 
motion nm + 1 times. If we assume that in solving a one-dimensional 
minimization problem we have to calculate on the average 3 or 
4 function values, then the amount of calculations at each iteration 
with the method being studied is about the same. It is not as yet 
clear though what accuracy the computation of the minimum in 
a direction of motion must be performed with in the method of 
conjugate directions so that the properties of the process be not 
violated. From the viewpoint of the influence on the convergence, 
the algorithm of a, choice in process (6.1) is to be preferred. 

On the whole, given the possibility of using methods of type (6.1), 
they must be more effective than the method of conjugate direc- 
tions; however, it should be stressed once more that the field of 
application of the latter is broader. 

Finally, it should be noted that in studying process (6.1) we 
made it practically clear that the calculation errors in determining 
vector e, of the order of O (|| r;\||') (see (6.4), (6.5), (6.6)) and in 
determining vector f’ (x,) of the order of O (&,;|] pz||) (see (6.16)) 
do not violate the properties of process (3.4) (convergence, bounds 
on the rate of convergence). If we consider the variant of process (3.4) 
in which rz4, = 224, — Zp, then we can obtain other expressions for 
estimating the errors. From a certain step on in process (3.4), a, = 1 
and, consequently, we have 


ll rall =I] Pa-all =I] tx — ta-all =lAa=ife—all 2 mall fr-ll. 


Taking into account (1.12), we obtain || 7,|| 2 m,m]|| z,-, — 2z]|. 
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Thus if ry4, = 2,41] — 7, the errors in the calculation of vectors e, 
and f, of the order of O (|| z,-; — xq||’) and O (E,||z, —2z,||) do not 
tell on the properties of process (3.4). 
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CHAPTER III 


METHODS OF CONSTRAINED 
FUNCTION MINIMIZATION 


This chapter describes various methods of function minimization 
with constraints on the variables. The first section develops methods 
of solving problems of quadratic programming which is a subsidiary 
problem in many algorithms. The following sections describe the 
algorithms for solving problems of convex and nonconvex program- 
ming. Everywhere, if only feasible, the bounds on the rate of con- 
vergence are given. 


1. PROBLEM OF QUADRATIC PROGRAMMING 


Usually the problem of quadratic programming is understood 
to be the problem of the minimization of a quadratic function with 
linear constraints. Thus the problem of quadratic programming is the 
minimization of the function 


f(a) = (2, Cz)+(d, x) (1.1) 


with the following constraints 
(ai,4)—5; <9, iET,~, 
(a;, x) — b; = Q, iE J® 


where x € EE”, a; CE", i€FJ-U J°, d€ E", b; are numbers, C is 
an nm X nm symmetric, positive definite matrix, i.e. (r%, Cr) >O 
for all z, and J~ and J° are finite sets of indices. 

The basis of the numerical method of solving this problem is the 
method of conjugate gradients. The main idea of the application of 
this method to problem (1.1)-(1.2) is as follows. 

Let x, be a point which satisfies constraints (1.2). We pick out 
among the constraints those which are satisfied as equalities. These 
constraints determine a certain face of the polyhedral set defined 


(1.2) 
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by linear inequalities (1.2). We find the minimum of f (zx) on this 
face using the method of conjugate gradients. The point obtained is 
the solution of our problem or indicates a transition to a new face 
and then the procedure is repeated. Since the method of conjugate 
gradients minimizes function f (z) after a finite number of steps, 
and the number of faces of the polyhedral set is limited, it is clear 
that an algorithm of this kind converges after a finite number of 
steps. 


Operators of Projection 


Let now J = J-U JF and ¥ bea subset of the set of indices J. 
We form matrix Az whose rows are vectors a;, i€ ¥ so that the 
matrix is m < n-dimensional, where m is the number of elements 
in set ¥. 

Lemma 1.1. /f vectors a;, i€% are linearly independent, then 
matrix AyAY is nonsingular. 


Proof. Let y € E™ be a nonzero vector such that 


Then 


But A‘yy is just a linear combination of vectors a;, i € ¥ with coef- 


ficients y", i = 1, ..., m, where y’ are components of vector y. 
By the assumption that a;,i € ¥ are linearly independent, this com- 
bination cannot be zero. Therefore, (1.4) and consequently (1.3) 
from which (1.4) was obtained are not true. Thus, matrix AzA% 


can be made zero only by a zero vector, and this means that this 
matrix is nonsingular. Let us now define operator P: 


1 = AY (AAG) Ay. (1.5) 
It is easily seen that operator P has the following properties: 
PP =P, (1.6) 
p* = P, (1.7) 
PU —P)=(U—P)P=0. (1.8) 


Operator P is the operator of orthogonal projection into a subspace 
spanned by vectors a;, i € #. 
Indeed, for any vector z € E" 


x= Pxr+ Ud — P)z. 
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Further by (1.7), (4.8), 
(Pz, ( — P) x) = (a, P* (I — P) zr) =0 


and so Pz and (J — P) z are components of the orthogonal resolution 
of vector z. Moreover 


Px = Ayu= » a;u* 
icy 


where vector u € E” with its components u' is defined by formula 
“= (Ay Ay) Ayx. 

The expression for vector Pz shows that it is wholly in the subspace 
spanned by vectors a;, i€¥. 

Note now that 


Therefore, for any x € E", vector y = (1 — P) =x satisfies the system 
of equations Agy = Q. 


Minimization of a Quadratic Function in a Subspace 


Suppose now that we have to minimize a quadratic function f (z) 
defined by (1.1) with the constraints 


(a;,z) —b, =0, ic¥#. (1.10) 


We assume that vectors a;, i € ¥ are linearly independent. 
Let z,) be a point which satisfies (1.10). 
Note that if we denote by by a vector whose components are /,, 


i€%, then the system of equations (1.10) can be written in the 
form Ayz — by = 0 so that Ayr, — by = 0 
We now introduce a new variable y defined as follows: 
r=2,+(J—P)y (1.11) 


and consider the quadratic function 
p(y) = f (% + U — P) y). 


The gradients of functions g (y) and f (z), according to the rules 
of differentiation of a composite function and the symmetry of 
operator P, are related as follows: 


pe (y) = U — P/F (2) (1.12) 


where x and y are connected by (1.11). 
Lemma 1.2. Let y be the point of absolute minimum of function 
@ (y). Then the corresponding point 


r=2+(I—P)y 
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is the minimum point of function f (x) with constraints (1.10). 
Proof. At point y the gradient of function @ (y) becomes zero: 
gm’ (y) = 0. Therefore, by (1.12), 


(I — P) f' (x) =0 
or 
f’ (z) — AY (AyA¥)* Agf’ (x) = 0. 
Taking u = — (AyAy)™ Ayf (x), we obtain 
f’ (x) + Ayu = 0. (4.13) 
Using (1.9) we obtain also 
Ay z= Ay ty + Ay (I — P)¥ = Ayty = by, 


i.e. x satisfies conditions (1.10). 

Thus zx is the feasible point and at this point conditions (1.13) 
are satisfied, which are necessary and sufficient for xz to be the mini- 
mum point of f (x) with conditions (1.10). The lemma is proved. 

Lemma 1.2 shows that the problem under consideration can be re- 
duced to the minimization of quadratic function o (y) without con- 
straints. To minimize @ ys we apply the method of conjugate gra- 
dients (Chap. II, Sec. 4): 


Yo = 0, Mh = —qQ’ (0), 
Yrti = Yr + On4iPrt4i> 


Pro = —@ (yn) + wer Dr- 


The quantity a,4, in these formulas is calculated as follows 


Onag — ———__(P Wn)» Pros) 
ne (Ph+is (I —P) C (1 — P) Pret) 


since it is easy to ascertain that the matrix which determines the 
quadratic term of function @ (y) has the form 
(i — P)C (I — P). 
These formulas determine the process involving the additional 


variables y. It is, however, expedient to go over to the original 
variables z. We preliminarily prove that the following relation holds: 


(I — P) Dr = Pr. (1.14) 
Indeed, with k = 1 we have: 
(J — P) p, = — (J — P) @ (0) = — (J — P) (I — P) ff" (x) 


= — ( — P)f (*) = —q' (0) = py 


149 


CONSTRAINED FUNCTION MINIMIZATION 


where we have made use of (1.12) and of the fact that 
(i — PP) — P)=I—-P—(I—P)P=I1-—P. 


Now suppose that (1.14) holds for & and prove that it holds for 
& +1, where we have again made use of Pd. '2) and (1.14): 


= —(I—P)(I—P)f sey IP BOE, p 


— __ __ Shinn || QP’ (YR) (I? _ 
(L P)f (2p) +7 p’ (Yp—4) Tk Pr Pr+i1- 


It follows now from (1.11) that 

Trty = Ly + I — P) Yrs, 

Trty = Lp + (U— P) (Yrti — Yr) = Ce + UL — P)/) OpntiPrti, 

1.0. 
Lpti = LR + Opti Prti- 
Let us transform the formula for p,,,, using (1.12): 
_ _ (J __ py# | (7 —P) f (ze) |? 
Ph+i (I P)f (Tr) + 7 (I — P) fj’ (xp—4) Tk Pr- 

The formula for a@,4, now takes the following form: 


‘od — _ (EP) FP hh), Phot) __ _(F (tr)s Past) 
a ((L—P) Prats © (L—P) Pras) (Ph+ts CPr+i) ~ 


Theorem 1.1. The problem of the minimization of quadratic func- 
tion f (x) with constraints (1.10), given the initial point xy which 
satisfies (1.10), is solved after a finite number of steps by the following 
process: 


Py = —(U —P) ff’ (2), 
Lh4y = Le+ &p4yPrtis 


(f—P 2 
Pas = —(—P)F (2) + PrP en IPM 


__(f (@k)s Pr+1) k=0. 1..... 


0 4 = 
A+d (Pr+is CPr+4)’ 


The proof of this theorem was given practically in the argument 
used in deriving the formulas of the process. 

Remark. As we know (Chap. II, Sec. 4), if the method of conjugate 
gradients is applied to a quadratic function with a singular matrix C, 
then the process converges after a number of steps not exceeding 
n — l, where J is the number of zero eigenvalues of matrix C. In 
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minimizing @ (y) we applied this method to a function whose matrix 
was (/—P)C(f— P). But since Ay (/ — P) =O, that is, 
(I — P) Ay = Q, we have (J — P)a; = 0, i € ¥. Therefore, in the 
case under consideration the number of zero eigenvalues of matrix 
(J — P)C UZ — P) is not less than m, where m is the number of a;, 
i € }. Therefore, the process suggested either converges to the mini- 
mum point or shows no lower bound of quadratic function f (x) with 
constraints (1.10) after a number of steps not exceeding n — m. 


Algorithm of General Problem 
of Quadratic Programming 


Let us now return to the general problem (1.1), (1.2). For each 
point xz which satisfies (1.2) we set 


¥ (z) =ti: (@;,z) —b; = 0, tET-U J}. 


In what follows we assume that the following condition of non- 
degeneracy is fulfilled: with any zx vectors a;,i€ 4 (xz) are linearly in- 
dependent. 

We now propose the algorithm for solving the problem. 

Let zx) be an arbitrary point which satisfies (1.2) and is the first 
approximation. Take a set of indices ¥, = } (x )) and construct 
Operator Py 5: 


Py, = Ay, (Ay A¢) Axo. 
Calculate the quantities 
Ug = — (Ay, 4) Ay f (Zo), (1 — Pz) f’ (%o) =f' (2) + A% oo- 
There are two possible cases: 
(1) 7 — Py) f' (x9) = 0. Here 
f’ (ao) + Ay Uo = 0 (1.15) 
and point z, is the minimum point of f (z) on the face defined by the 
system of equations 
(a;,x)— 6; =9, ti€ Fo 


(see Chap. I, Sec. 3). 

If there are no negative components among ui, components of 
vector Wo, i € # (Xp) () J, then (see Chap. I, Sec. 3) point zx, is the 
solution of the primal problem (1.1), (1.2), for in this case (1.15) 
are the necessary and sufficient conditions for the minimum of func- 
tion f (z) with constraints (1.2). 
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Suppose now that there is an index j € ¥ (xo) ) J- suchthat vi < 
< 0. Construct a new set of indices ¥, by deleting index j7. We apply 
the method of conjugate gradients described in the subsection on 
p. 148 to solving the problem of minimization of f (x) with con- 
straints 


(a;,2)—b: =0, i€ ¥. (1.16) 


However, in applying the method of conjugate gradients, the process 
must not transgress the limits (1.2). Therefore at every step of the 
algorithm the following check should be made. Compute the quantity 
~~ _ mm: b; — (aj, ZR) 
Pht = min (@j, Ph+4) (1.17) 
where the minimum is taken over all i for which (a;, p41) > 0. 
In this formula xz, is the point just constructed by the algorithm and 


Pr+: is the conjugate direction at this point. 
Let now @,4, be the corresponding step length in the method of 


conjugate gradients. If a,j4, <(Gpi,, then 2,4, = Zp + OntiPati 
and the process goes on. If however ap4, = p41, then 2,4, = TR + 
+ O241Pr+, and the process stops. 

Thus, either we find the minimum point of f (zx) under condi- 
tions (1.16) or the process will be truncated when a@,4, 2 O44. 
In both cases we take the point obtained to be the initial point and 
proceed using the new point as we did with the initial one, zo. 

(2) (I — Py) f’ (ao) # 0. 

In this case we apply the method of conjugate gradients to solving 
the problem of minimization of f (x) with constraints 


(ais x) — 6; =), LE Fo (1.18) 


starting at point z,. At every step, as before, a check is made 
whether the points obtained are feasible or not, i.e. we calculate 


n+, by formulas (1.17) and apply the process of conjugate gradients 
until either we find the minimum point of f (x) with constraints (1.18) 
or the condition @,4, > a,;4, is satisfied and the point z,4, = 
= Zp + Opn41Pa+, Obtained. In both cases we take the point obtained 
as the initial one and repeat at it the operations performed with Zp. 

Let us substantiate the convergence of the method after a finite 
number of steps. We must first of all show that in case (1) as well 
as in case (2) a successful step will be made, i.e. we move from 
point z, to a new point at which the value of function f (z) will be 
strictly less than f(z. 

New points are obtained by the method of conjugate gradients 
and in this method the function decreases at each step. Therefore, 


the only thing we have to show is that a,4, > 0 always, i.e. con- 
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straints (1.2) permit to take a nonzero step in the direction chosen, 
Prt, and besides that in case (1) point zp) is not the minimum point 
of f (x) with constraints (1.16), for if it were so the method of conju- 
gate gradients would not have moved the process from point Zp. 
Let us prove several subsidiary lemmas. 
Lemma 1.3. Vector p, = — Ul — Py) f’ (Xo) is the solution of 


the problem of minimizing function 
’ 1 
P (P) = (f’ (20), P) + Il PII? 
with constraints 
Ay p = Q). (1,19) 


Proof. Indeed, by (1.9), p, satisfies (1.19). Moreover, @’ (p) = 
=p-+f' (x). Therefore, 


Q (i) =A +f (%) = — UF - Py) f (to) + fF (2p) 
— Py of (19) = —A% oto: 
Hence, 
@ (p1) — Ay Uo = 0. (1.20) 


The last expression is the necessary and sufficient condition for 
convex function @ (p) to attain its minimum at point p, with con- 
straints (1.19). The lemma is proved. 

We formulate now a problem which is the dual of the problem of 
minimizing g (p) with constraints (4.19). According to the rules 
stated in Sec. 3 of Chap. I, we have to find the minimum of function 
@ (p) + u* Ay p- Differentiating with respect to p and equating the: 


derivatives to zero, we obtain p + f' (zo) + AY ou = Q, i.e. 
, ak 
p= —f (Xo) — A¥ ou. 
Substituting this expression for p we obtain that 


e 4 e * 
min {p (p)+u*Ay p}= —z If" (20) + AYU II’. 
p 


Thus the dual problem consists in finding over all possible vectors u 
the minimum of the function 


g*(u) = —s ll 1’ (to) + A%,¥ IE. 


Now differentiating m* (uw) and equating the derivatives to zero, 
we can easily ascertain that vector 


Ug = — (Ax Ay)" Aygl (0) 
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is the solution of the dual problem, i.e. maximizes g* (u). Recall 
that ui, i€ ¥>_ are the components of uy. Thus, vector uy, is the 
vector of Lagrange’s multipliers in the problem of the minimiza- 
Lion of @ (p) with constraints (1.19). Besides, we obtain that the 
value of the minimum of @ (p) with constraints (1.19) and of the 
maximum of g* (w) over u, which is the same by the duality theorems, 
is equal to 


— FUP (20) + Aue IP or — IU —Py,) f (20) IP 


Lemma 1.4. Let matriz Ay: be formed from Ay, by deleting the row 
with index j for which uj <0 and let (I — Py )f (x9) = 0. Then, 


vector py = — (I — Py») f’ (xo) is not zero and (a;, p,) < 0. 
Proof. Vector p, can be written in the following form: 


Pi= —(f" (%o) +A¥sv), v= —(Ay Ay)! Ayf’ (20). 


If p, = 0, then f’ (Zp) + Ag-v —0. But on the other hand, by 


assumptions, 
(I— Py.) f’ (a) =f" (#0) + AY,uo =O. (1.24) 
Subtracting from (1.21) the first equality, we obtain 


* — Ax*,,9—yiaq: | i 7)! — 
Ay Uo Ay sv usar 2 (ui —v') a, =0 


which, as u; = 0, is impossible since vectors a;, i € #o are linearly 
independent. We now prove the second part of the lemma. 
Rewrite (1.21) in the component form: 


f’ (20) + 2 uia; +(—u?)(—a;) =U. (1.22) 


Note that —ui > 0 since u/ < 0. Consider the problem of mini- 
mizing gp (p) = (p,f' (%o)) + || pll?/,. with constraints 


(a;,p)=0, i€¥*, — (aj, p) <0. (1.23) 


Since q’ (p) =f (%)) + p, we have g’ (O) =f (x9), and there- 
fore, (1.22) is the necessary and sufficient condition for the point 
p = 0 to be the solution of the problem of minimization of @ (p) 
with constraints (1.23). On the other hand, by lemma 1.3, p, is the 
solution of the problem of minimization of @ (p) with constraints 
Aysp = O or in the component form 


(a:,p)=90, ic ¥. (1.24 
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Suppose that (a;, p,) 0. Since p, satisfies constraints (1.24), it 
satisfies all constraints (1.23) too. But 


(f' (20), Ps) = —(f" (20); (I — Py) f’ (2o)) 
= —(Py-f" (a) +(I—Py:) f (20), (I—Py:) f’ (a0) 


= —((I—Py.) f’ (%), (I— Py) f’ (a) 


=—| Pr |. 
Therefore, 


~ (D1) =(f" (Zo), D)+sl Py ||? = —+|| Ps |? << 0. 


The last inequality contradicts the fact that the minimum value of 
@ (p) with constraints (1.23) is attained with p = O and is equal to 
zero. This contradiction shows that (a;, p;) <0. The lemma is 
proved. 

We return now to the algorithm constructed. Let us consider 
case (1) and let point z,'be not the solution of the problem of quadrat- 
ic programming. According to the algorithm, we should apply the 
method of conjugate gradients in order to minimize function f (z) 
with constraints (1.16). In accordance with the formulas of the 
method, the first step is made in the direction of vector 


y= — (I — Py) f (2o).{ 


By lemma 1.4, p, ~ 0 and consequently point x, is not the solu- 
tion of the subsidiary problem of minimization under consideration. 


We now demonstrate that a, > 0. 


Indeed, vector p, satisfies condition (1.24), and (a;, p,) <0 ac 
cording to lemma 1.4. Therefore 


(ai, 1) <9, iE Foe (1.25) 
For i€ #9 by the method of choosing set #,, 
(€;, 2%) — 5; <0. 
Therefore 
Q, =min *i—(4i, FO) 
1 (a;, P4) _ 
since the minimum is taken only over those i for which (a;, p,) > 0 


and consequently practically over a certain subset of indices i that 
does not intersect ¥, according to (1.25). And 


bi — (i, Lp) > O. 
with such i. 
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The fact that a, > 0 indicates that all of the points z, + ap, 
with O<a< aq, satisfy conditions (1.2). Indeed, for i € #, 


(Qi, Z+ ap) —b; = (ai, Zo) —b6; +a (aj, P1) 
— = 0, iC Fes 
=a (a A) { | 


< 0, l — J. 
For i€ Fo 

(a1, ty + ap,) — by = (aj, %) —b; +a (a, Pr) <0, 
if (a;, py) <0; however if (a;, p,) > 0, then 

ax<a< bi — (a1, Zo) 
(a;, P14) 
and therefore 
_ b;— is — 
(Qi, Xp + &Py) — 0i< (Aig 9) — by +e (Qi, Ps) = 0. 
to #4 

Note that the sign of inequality in the last expression is to he 


considered strict if a<a, or Oy < “ae 
Gir Pi 


According to the algorithm, two cases are possible: a, <a, and 
Q, -=a,. In the first case we obtain a new point zx; = 2%) + @;p, 
that satisfies the relations 

(aj, %)— 6; = 90, i€ Fo (a;,7,)— 6, <0, i€ Fo- (1.26) 


In the second case, we obtain point z, = XY) + a@,p, and it is taken 
as a new initial point from which the algorithm begins to operate 
in checking case (1) or (2). Point z, satisfies conditions (a;, z,) — 
— b; =0, i€ ¥, and moreover equalities (a;, z,) — 6; =O with 
all 1€ 4, such that 
bi— (i, Zo) __ Cys 
(a,, P4) 


Thus ¥ (21) > $o, the inclusion being strict. 
We return now to the case when a, <a,. Here the application of 


the method of conjugate gradients goes on, and so long as @,4, << Opty 
holds, all of the points z,4, continue to satisfy the relations (1.26) 
(like x,), since by (1.14) and (1.9) with P = Py» we have 


Ayspr = Ay; (1 — Pys) Pr = 0; 
i.e. if written in the component form 
(a;, Pr) = Q, te F o- 
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The inequalities 
(a;, Zp) — b; < 0, t€ Fo 


also will not be violated, for their violation would mean that the 


case where @,4, > @a4, takes place. 

Thus, we have demonstrated that in case (1) the iterative process 
constructs successively points x), 71, ..., Zp, 1 and the value 
of f (x) strictly decreases along this sequence because it is constructed 
by the method of conjugate directions. The last point z, is either the 
minimum of f (z) with constraints (1.16) or at this point ¥ (z,) com- 
prises strictly set F,. 

In case (2) the direction of motion from point z, coincides with 
vector py = — (I — Py,) f’ (#o) #0, (a1, Pr) = 0, 1 € Ho (= F (2) 
and therefore it can be easily shown that a, > O and the method of 
conjugate directions makes it possible to make at least one nonzero 
step to the new point z, at which the value of f (zx) is strictly smaller. 

All the proofs in this case are analogous to those given above. 
We obtain as a result the sequence of points x), 2, ..., TR, k 21, 
and x, is either the minimum of f (x) with constraints (1.18), or 
¥ (tr) > Fo- 

Note that in case (1) as well as in case (2) if point x, is the mini- 
mum point of f (x) with constraints (1.16), then in both cases zx, 
is the minimum of f (x) on that face of the polyhedral set which is 
determined by expressions 


(a;,z)—b; =0, iC (zy) (1.27) 


since, by construction, 4 (z,) > $, im case (1) and $ (x,) > #, in 
case (2) and the minimum point on the broader set is the minimum 
point also on the narrower one. 

We show now that after a finite number of steps starting with 
point z, we shall inevitably come to point z,; which is itself the mi- 
nimum point of f (x) with constraints (1.27). Indeed, it can be seen 
from the foregoing that if the method of conjugate gradients does 
not result in the finding of the minimum point, then it follows im- 
mediately that the set of i indices is extended and for them the next 
point obtained satisfies the relations (a;, z,) — b; = 0. Since, by 
assumption, vectors a;, i € ¥ (z,) are linearly independent, it is 
clear that this extension must be truncated after a finite number of 
steps not exceeding m, where nm is the dimension of z. 

Thus after a number of steps not exceeding n the algorithm described 
constructs the next point z, which is the minimum of f (x) with 
constraints (1.27). 

Note that the sets $ (x,) with different zx, are different for the value 
of function f (x) decreases monotonically along the sequence con_ 
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structed. Indeed, let z,, and z,, m <k be minimum points of f (z) 
with constraints 


(a;, x) — ); —= Vy 


t€ ¥ (z,) and i€ $ (z,) respectively. If } (x,) = ¥ (z,), then 
clearly f (z,,) = f (z,). But according to the construction of the 
process, f (t,,) <f (x,) with m > k and thus the equality $ (z,,) = 
= ¥ (x,) does not hold true. 

On the other hand, all sets ¥ (x,) are subsets of a finite set 7 = 
= J-U J° and therefore the number of such subsets is limited. 
It follows that the process proposed must be truncated after a finite 
number of steps. But this can occur only if the minimum point of 
f (x) with constraints (1.2) has been found, otherwise as shown above 
the process can go on. 

This proves that the process converges after a finite number of 
steps. 

Remark. If matrix C is singular, then according to the theory of 
methods of conjugate directions it can occur that (f' (z,), D_+1) ~ O 
at point z, but (pz4,, Cpz4,) = O. In this case @,4, cannot be cal- 
culated since 


__ (f’ (TR), Pr+1) 
(Prats CPpo1)’ 


However in this case, f (x, + @pz4,) decreases with increasing @ 
and therefore we can assume that a;4, = -+oo and perform the 
calculations as usual. If a,4, <i +oo, then the application of the 
method of conjugate gradients will terminate at point 2,4, = 
= Ip + On41Pati; this violates the above argument in nothing. 
However, if a4, also has no limit, i.e. if (a@;, pp4,) < O for all i, 
then the motion along the ray zx, + apz4, results in a decrease, 
without limit, of function f (x). This means that the problem of 
quadratic programming which we consider has no solution since 
the lower bound of f (x) with constraints (1.2) is —oo. 


Cri = 


Computational Aspects 


The algorithm proposed comprises in essence only one complicated 
computational operation: the projecting of the gradient on a sub- 
space, i.e. the calculation of the quantity (/ — Py) f’ (z). There are 


two ways of performing this calculation. 
The first one consists in a direct calculation of matrix Px, i.e. 


Py = Ay (AyAy)” Ay. This involves calculating matrix (A yAy). 
If this matrix is known, then the calculation of the required vector 
“u= — (AyAy)™ Avyf' (x) is reduced to multiplying the matrix 
by the vector. 
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In order to diminish the amount of computations at each step, 
when set ¥ changes, one can make use of the fact that by deleting 
index j we delete in matrix AyA% one column and one row, thus 
obtaining matrix Az A%-. Just in the same way, in adding an 
additional index to set %, matrix AyAy acquires an additional 
column and row. 

This makes it possible to use the following recursive formulas 
known from linear algebra (see D. K. Faddeev and V. N. Faddeeva). 

Let B be an arbitrary symmetric n X n matrix which can be 
written in the form 


R 'D “) 
=\ 1 b 


where D is an (n — 1) X (rn — 1) matrix, uw is an (m — 1)-dimen- 
sional column-vector, u* is its transposed vector, bis a number. Then 
it can be easily ascertained that 


_ D-tuu* D-! D-lu \ 
Dit 
—1 _ 
Bo = wk D-1 4 
\ a” a 


a= b6— u* Du. 


Thus if matrix D-1 is known, then matrix B=, where B is obtained 
by adding the last column and the last row, can be obtained by 
simple calculations. 

Conversely, if matrix B-! is of the form 


Bo = " ° 
p* m!}’ 
then for matrix D™' we have 
Dit=G— pp* 


Ne 


Thus ifthe new matrix is obtained from the original one by delet- 
ing the last row and the last column orby adding a row and acolumn, 
then the inverse matrices are obtained by simple arithmetic opera- 
tions. The fact that in the formulas given above we deleted the last 
column and row does not matter, for it can be easily checked that 
the transposition of rows in the original matrix leads simply to a 
transposition of columns in the inverted matrix, and the transposi- 
tion of columns—to the transposition of rows. 

Thus we have shown that the calculation of the matrix of projec- 
tion can be performed by recursive formulas. The drawback of these 
recursive computations is that they may lead to a great cumulative 
computation error. 
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Let us describe another way of computation. 
It was shown in the subsection on p. 151 that vector py = 
= — (J — Px) f (=) is the solution of the problem of minimizing 


(f’ (x), p) + > | pi? with constraints Ayp = 0.J It is expedient 


to go over to the dual problem which, as demonstrated above, con- 
sists in the maximization of the quadratic function 


—+|If' (2) + AbulP 


along vector uw without constraints. This problem can be easily solved 
by the method of conjugate directions. As was shown in the subsec- 
tion on p. 191 its solution is vector u, = — (AyAy)™ Ayf (x), 
i.e. the vector which is required for the application of the algorithm 
for solving the general problem of quadratic programming. Vector po 
is easily calculated using uw, and the following formula: 


Po = — (I — Py) f (2) = — If (2) — Ad (ApA¥)" Ay F (2)! 
= —(f’ (xz) + Ay Ugly 


1.e. 


po = — If (2) + Abul. 


Thus in using the second way of computing, the operation is 
reduced to applying many times the standard procedure of the 
method of conjugate directions. 


Problem of Quadratic Programming 
with Simple Gonstraints 


The problem with simple constraints is understood te be the 
problem of minimizing 


with constraints z' >0, i€ 4%, where J is a subset of the set 
{1, 2, ..., m}. In this case the algorithm of the subsection on 
p. 151 is considerably simplified. Instead of performing these simpli- 
fications formally, we shall formulate an algorithm for solving the 
problem. Its description will make it clear that the proof of its 
convergence after a finite number of steps coincides with the proof 
of the algorithm of the subsection on p. 101. So, let z, be an 
arbitrary point which satisfies constraints zi > 0, i € 7. Suppose that 


J (xe) = fii x; = 0, iE TZ}. 
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We shall describe now the procedure for one iteration with the 
initial point x5. We calculate the set ¥ (z,). Two cases are possible. 


(1) (f' (zo))' = 0, iF ¥ (2), where (f’ (z,))' is the i-th component 
of vector f’ (z,). 


In this case point 2, is the minimum point of f (x) with constraints 
z' = 0, i € ¥ (2,). If at the same time (f’ (z,))' > O with i € ¥ (z,), 
then zx, is the solution of the problem, for at point z, the necessary 


and sufficient conditions for a minimum are satisfied (see Chap. I, 
Sec. 3). 


Let now (f' <0 for some i € ¥ (z,), and set 
= {i € ¥ (xo): (f" (xo))* > 0}. 


We apply the vathod of conjugate gradients to minimize f (2) 
taking as variables only x’, i€ 4’ and taking all the xz’ for i € #’ 
to be zero. The method of conjugate gradients requires the quantity 
Ort be computed 


i 

— . Lp 

O41, = min {— ; 
i Pri 


where the minimum is taken over all i€ $’ for which Piet =< 0. 
Then we compare o,4, and On+1 

If Gr) <CGpii, then iy, = = 2; + On+iPhtt» i€ ‘a Tht = x = 
=0,7€ ¥. If Opty 2S Opti, then 2p41 = Th + Cpt Veriy CEH, 
t+, = t —0,i € ¥’. The calculation process will be truncated after 


a finite number of steps and a point x;4, will be found such that it 
provides minimum of f (x) subject to z' = O, i € #’ or such that 


Ort+y == %n4,- Here, ¥ (x,) > ¥’ and the inclusion is such that 
there are i € $ (z,), but i€ #’. In both cases point z,4, is taken as 
the initial point and the process is repeated. 

(2) There are indices i such that (f’ (z,))' ~ 0, i€ ¥ (Zo). In this 
case we apply the method of conjugate gradients to minimize f (z) 
with variables x’, i€ ¥ (x,). The components zx’, i € ¥ (z,) all the 
time remain equal to zero. Moreover as in case (1), at every step we 
calculate the quantity 


— fa} 
Cn+13 = Min \ — ; 
i Prt 
where the minimum is taken over all i € ¥ (x9), pr414 <0. The process 
stops in the same way as in case (1). 

It is easily seen that an argument analogous to that given in the 
subsection on p. 101 results either in the proof of the convergence 
of the algorithm after a finite number of steps or in establishing the 
fact that f(z) has no lower bound with conditions z* > 0, i€ J. 
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2. METHOD OF FEASIBLE DIRECTIONS 


The method of feasible directions was one of the first methods 
suggested for solving the problem of convex programming. 
Suppose it is required to minimize function f, (v7) with the con- 
straints: 
fi (v7) <0, i=1,..., Mm, Ar —b=0 (2.1) 


where x € EE”, f,; (x), i =O, 1, ..., m are convex continuously 
differentiable functions, A is anl X m matrix, b is an /-dimensional 
vector. Moreover, suppose that the gradients of functions f; (zx), 
i=Q, 1, ..., m satisfy Lipschitz’ condition: 


and || f; (z)|| < A for all points x which are considered in what 
follows. We denote by D the admissible region, i.e. the set 


D = {x: fi (x) <0, i=1,..., m, Ax —b =O}. 


We shall assume in what follows that set D is compact and the 
condition of the gradients having bounds is fulfilled. Let z) be a 
point of D. We find a direction p € E” such that with small 
Q (rq + ap) € D, and, besides, fy (% + ap) < fy (to). Such a direc- 
tion is called feasible. Moving along this direction by one step a, 
we obtain a new point z, = 2) + a,p € D. We take this point as 
the initial point and the process is repeated. The problem consists 
now in working out an effective method of finding feasible directions 
and choosing step @ so as to provide for convergence to the minimum 
point. 

Below we assume always that the following condition of nonde- 
generacy is fulfilled: there is a point x such that 


Az—b=0, f;(z) <0, i=1, ..., Mm. 


Method of Choosing Feasible Directions 
Let 
Is (x) ={i: f, (x) > —6, i=1,..., m} 


for each point z €D. Let €; >0, i=0, 1, ..., m be arbitrary 
numbers. Consider the following problem at each point zx € D: 


min 7, 
(fi (v7), pP) SEM, LETS (x) U {9}, 
Ap=9, |lpil<1 (2.3) 
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where » is a number and ||p|| an arbitrary norm. To make (2.3) 
a problem of Jinear programming it is convenient to take as a norm 


| pil == max |p’ |. 
<j< 


SISMN 


Let pos (xz), Ns (z) be a solution of problem (2.3). Since vector 
p = 0,yn = 0 satisfies constraints (2.3), clearly we have ng (x) < 0. 
We demonstrate that ps (x) is a feasible direction if 1 (x) < 0. 

Indeed, let @ >0O. For i = O we have by Taylor’s formula 


fo (© + aps (x)) = fo (x) -F @ (f, (Bo), Po (x) 
= fo (%) + & (fo (#), Pe (x)) + & (fo (90) — fa (), Po (*)) 
S fo (&) + & (fy (%), Po (%)) -- @*C|l ps (z)IF 


where 0, = z + E,ap, (x), O< &) <1 and where we have used 
the fact that 


I fo (80) — fo (z)I| < Cl] ®o — z|| < Call pe (z)II. 
Further by (2.3), (fp (x), ps (x)) < Eqns (x). Therefore 
fo ( + aps (z)) < fo (x) + %E ons (x) + aC || ps (z)|I’. (2.4) 
By analogy, for i € 4% (z) 
fi (x + ape (z)) < fi (@) + @E is (x) + &*C|| ps (2) II. (2.9) 
Further, for i€ J¢§ (2) 
fi (x + ape (z)) = fi (x) +.& (fi (03), Ds (2) < fi (7) + aK lp (2)II- 
(2.6) 
We now choose @ > 0 such as to satisfy the inequalities 
fo (x + pg (2) <fo (2) + Eons (2), 
I; (2+ aps (2))<0, 1€ 5 (2), fi (@+ eps (2)) <0, 12.45 (2). (2.7) 
To satisfy these inequalities it is sufficient that 


aC || Pe (x) ||? a 
1p ANON 1 eng (2) +a ll ps 2) [P<O, 1€I6 (2) 


—6+ ak || ps(z)||<0, t€ F5 (2) (2.8) 
be true since, by (2.4), (2.5), the following inequalities hold: 


aC || Ps (z) +] 
50" (x) , 


fi (x + aps (2)) < fi (z) + @ [Ems (x) + @C|l po (x)IPI, ¢ € Fo (2), 


fo (+ ops (2))<fo (2) + aon (2) [ 4 + 
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and since f; (xz) << —6 for i€ J (x), then 
fi (x + aps (z)) < —5 + @K|| po (z)|I. 
From inequalities (2.8) we obtain: 
| Eon, (2) EN (2) 
Cllrs @F’ TiR@r? *Skine@i: 
(2.9) 
Thus if @ satisfies inequalities (2.9), then inequalities (2.7) are 
fulfilled and it follows that ps (x) is really a feasible direction for 
Ap s (x) =O and therefore A (x + aps(x)) —b=Azr—O+ 
+ a@Aps (xz) = 0. We now show that if point z does not coincide 
with the solution of the problem z, which is the minimum point 
of f, (x) in region D, then ny, (x) <0 with any sufficiently small 6. 
Lemma 2.1. Let x € D be not a solution of the problem of minimiza- 
tion of fy (x) with constraints (2.3). Then ng (x) <0 for any sufficiently 


small 
Proof. Recall that we assume the conditions of nondegeneracy 


to be fulfilled, i.e. there is a point x such that 
Ax—b=0, f;(t)<0, i=1, ...,m,o0<0. (2.10) 
Let point z, be the solution of the problem. Further, if 
Jo (x) ={i: fi (x) = 0, i =1,..., my}, 
then with 6 <6, 


ax — 0 5 


—6o — max fi (x), 
i€S o(x) 
Js (x) = Ju (x). Indeed, if i € J (x), 6 <5,, then f; (x) > — 
But for all i€ J (x), f; (xz) < —6) < —4, i.e. i € Jp (x). Suppose 
that 6 <6, and so J§ (x) = Jp (x). We set 


rm =pr+(1—p)t%, O<p<t. 


Then due to the convexity of functions f; (z), i =O, 1, ..., m 
and the fact that f; (z,) <0, i=1, ..., m, we obtain 


fi (tp) < fi (x) + 1 — p) fi (te) < i=1,..., m. 
Further, for i €.%§ (xz), f; (z) = 0 ond therefore for 0 <A <1, 
Apo = Mf; (tp) = Afi (Zp) + (4 — A) fF: (@) SF Ate + (1 — A) 2) 

= fj (t +d (fp — z)) — fi (%) BA (Fe (2), Zp — 2) 
where we have used the inequality (Chap. I, Sec. 2) 
fly) —f() 2 (), y — 2) 


which holds for any differentiable convex function. 
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Thus 
po 2 (fi (Z), % — x), t E Jo (2). (2.11) 
Further, since point z does not provide a minimum of f (z) in D 
O>y = fo (Ze) — fo (t) & (fp (2), Te — 7). 
Hence 
(fo (x), t — 2) = p (f, (z), e — x) + (1 — p) (fo (2), Le — 2) 
<p (f(z), c—2z)+(1—p)y. (2.12) 
It follows from (2.11) and (2.12) that with sufficiently small 
0 > 0 the following inequalities are satisfied: 


(fo (2)s Pe) <0, (fi (2), Pp) <0, iE Ts (e) (2.43) 
where Po = XZ, — x, and the fact that o <0, y <0 has been taken 
into account. Take pp = pp, if || Po |< 1, and pp = if 
| Po || > 1, and so || pp || <1. Besides, we take 


(7, (2), Pp) 
Yo= max ——~, 
iegmutoy i 


Pp 
| Poll 


By (2.13), np <0 and the inequalities 
(fi (2), Pop) < Em, FETs (z)U {0}, Me <0 (2.14) 
are satisfied. Further, since z, = px + (1— p) zy and the equalities 
Ax—b=0, Az, —b=0 
hold, then 
Az —b= 0. 


Note that pp = app = a@ (x) — x), where O< a <1. Therefore 
Ap,p + (Ax —’b) =a [Az — b] + (1 — a) [Ax — db] = O. (2.45) 
If follows from (2.14) and (2.15) that vector p, and the quantity np, 


satisfy conditions (2.3). Since np < 0, so much the more fp (x) < 0, 
for ns (ct) <p by definition. The lemma is proved. 


Algorithm of Method 
of Feasible Directions 


Let x, € D be an arbitrary first approximation and 6, > 0. We 


describe the general step of the algorithm. Let point z, € D be 
obtained at the k-th step and 6, > 0. 
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Having solved the problem 


min 1, 
(fi (t7,), P) SEN, FE Fs (tr) U {9}, 


we obtain po, (x,) =p, and 6, (rp) = Nh: 
Remark. If we take the quantity max | p’ | as the norm of vector p, 


1 
then the above problem is a problem of linear programming and 
can be solved by one of the standard methods. 
Two cases are possible. 


(1) n,z<c—6,. We take successively wae, i=Q, 1, ..., and 


find the first i, such that the following inequalities 


1 


fo( n+ aie Pr) <fo (tn) + 


2. 2t0 


EoNk ’ 


fi (x, +—— Pr) <0, i= 1, ..., m 


are satisfied. We take a, = = and Xp4,.== Te+OpnPr, Snii = 9p 
so that 
4 
fo (Zr41) S fo (Zr) +> OnEoNk » 
fi (Tp41) <9, i—1, cee, M. (2.16) 


(2) Nn > —56,. We take 
Th+y—= Try Ort — >? Sp. 


Thus in the first case there is a shift to a new point, in the second 
one there is no such a shift. 

We now formulate also the condition of the halt of the algorithm: 
f at a certain step k, 6, < 6° (z,), where 


6° (xp) = — max fj (2x) 
iE gy) 


and 7; — 0, then zx, is the solution of the problem set above, 1.e. 
x, is the minimum point of f, (x) with constraints (2.1). 
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Substantiation of Convergence 
of the Algorithm 


We show that if the sequence {z,} is truncated at a certain step k 
because the conditions of the halt have been fulfilled, then zx, is 
really the solution of the problem. Indeed, let the conditions of the 
halt be fulfilled, i.e. 1, =o, (t,) = 0, and 


8; << 6° (z,) = — max fj (zp). (2.17) 
iC. 75 (Xp) 


But as was shown in proving lemma 2.1, if conditions (2.17) are 
fulfilled, then n,, (vx) <9 provided 2, is not the solution of the 
problem. But it was assumed that yn, = Oso it follows that z, is the 
minimum point of f, (x) with x € D. Let the iterative process be now 
continued without limit so that we have an infinite sequence {z;}, 


k= 0, 1, .... Let x, be a point at whichn, < —6y,, i.e. case (1) 
takes place. Then making use of estimates (2.9) and the fact that 
| Pall = || Ps, (x;,)|| < 1, we can state that if 
1 §&on 
ae 
c. 
ax — = , t€d,, 
axe iE, (2. 18) 


where to shorten the written form we set J,;= J 6, (Zr), then the 
inequalities (2.7) will hold: 


1 
Jo (Te + Pa) < fo (Tr) + 3 @EoNe 
fi (tp +oapr)<0O, i-s1,..., m. 


Now, recall that according to the algorithm, quantity a, coincides 


with the first of the quantities i=0O,1,..., satisfying in- 


97T ’ 
equalities (2.16). It follows that after a finite number of trials these 
inequalities will be satisfied. 

Let ip be the first of the indices satisfying the inequalities and 


SO a = —. This means that witha = the inequalities (2.16) 


2 gio~ 1 
were nol salislied and thercfore @ did not satisfy (2.18), i.e. 
. 1 ; ; 
yar min {> By min ~=2} 
2° ie Ik 
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Therefore 


ay = > smin { — , Su min — =i \ (2.19) 


If we take into account that in the case under consideration —1, > 
> 6,, then inequality (2.19) can be made stronger by substituting 6, 
for —1n,- Then we obtain: 


Oy > eg, Ey) = min So & £2 wey oo: =<}. (2.20) 


Using now inequalilies (2.16), (2.20) and the fact that yn, << — 6, <0, 
we obtain 


1 ° : 
fo (Lr41)S fo (Zp) > Huon S fo (L;) — Soto 5; . (2.21) 


It follows from this inequality that 6, —0O as k — oo. Indeed, as 
the sequence {5,}, k = 0, 1, ..., decreases monotonically, and 


if 6,4, < 6;, then 6,4, = = ons and the fact that 6, does not tend 
to zero can mean only that 6, = 6 > 0 for all sufficiently large k. 
But if 6, remains constant, then the condition n, < —54, is fulfilled 
and thus the inequality (2.21) holds. 


Thus for all sufficiently large k (k >k,), 6, = 6 and the in- 
equality 


fo (Ln41) So (4x) — foto 6° 


is fulfilled. 
Therefore 


E 9 
fo (Lx) < fo (2k) — (N — ko) “5? &, 


i.e. fy (ty) ~—oco as N > oo. But this contradicls the fact that 
the continuous function f, (x) in the compact region D is bounded. 

Thus we have shown that 6, —0O. This means that the initial 6, 
is successively halved an infinite number of times, i.e. that case (2) 
takes place an infinite number of times: n, => —5dx,. 

Let ¥ be a set of indices & for which case (2) took place. Then 
Nr ~0 ask >, RE Y. This follows immediately from the in- 
equality —6, <n, <0 and from the fact that 6, —0. Consider 
the sequence of points xz, € D, ke ¥. As D is a compact set, one can 
assume, Without loss of generality, that {z,} converges to a certain 
point z,. We shall demonstrate now that z, is the minimum point 
of fy (x) in D. 
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Suppose that the opposite is true, i.e. that point z, is not the 
minimum point of f, (x) in D. Then on the basis of lemma 2.1, 
we can affirm that with all 6 < 8° (2,) 


6° (z,) = — _Max fi (%.); 
iC 0 (xe) 


Js (ty) = JID (zy) and y6(x4) <0. Moreover, since (x4) = Jo(xz); 
we have 15 (%4) = No (ty) <0. Further, 9%, (tx) S Jo (ty) with 
sufficiently large & € ¥. Indeed, suppose that i€ 7 (z,), then f; (7,)< 
<Q. Therefore because of the fact that 6, —0O, with sufficiently 
large k we have f; (x,) << —6,, and since x, +z,, with great k we 
also have, f; (tx) << —8y, ie. i€ J5, (ap). Thus if i € J (z,), then 
with great k we have i€ J6, (x,), i.e. Js, (tr) = Jo (x,). Since by 
assumption z, is not the minimum point of f, (x) in D, there is a 
vector p (z,) such that Ap (zx,) = 0, || p (x,)\| < 1, 


(fi (Ze). P (Te)) S EM (Te), FE FO (2y)U {OF 


and, as mentioned above, 1p (z,) <0. But then by continuity with 
great A the following relations hold 


(Fi (tn), P(24))<Eimo (te), ETE U {0}, 


Ap (t%%) = 9, | p (zy) I| < 1, 
for x, > 2y, J; S Jj (xy). However, the last relations mean that 


1 
Ne = No, (Tn) SF No (74) <O 


with all sufficiently great & and this contradicts the fact that yn, —0, 
k +-+0o0,k €¥ which was proved above. The contradiction obtained 
proves that z, is the minimum point of f, (z) in D. 

Theorem 2.1. The sequence of points {x,} constructed by the method 
of feasible directions has the property that f, (x,) without increasing 
monotonically tends to fy (14), where 2, is the minimum point of fy (x) 
in region D. 

Proof. By construction of sequence {z,} we have fo (x,4,) < 
< fo (z,) and so this sequence of numbers does not increase monoton- 


ically. Since it has a lower bound, it converges to a certain limit, fp. 
However, it was shown above that there is a subsequence of {z,}, 
k € ¥ such that z, > 2,. Therefore f, (x,) > f (x,). As the whole 
sequence converges to the same limit as the subsequence, it follows 
that fy (xx) — fy (zy). Q.E.D. 

Remark 1. Among the constraints f; (xz) < 0 there can be such 
that for them functions f; (z) are linear. It is easy to show by a slight 
extension of the preceding argument that with these indices i we 
can take —; = 0 
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Besides, in this case the condilion of nondegeneracy can also be 
weakened, viz. it suffices to require that there be a point z € D 
such that f; (7) <0 only for those indices i for which f; (x) are 
nonlinear. 

Remark 2. Sequence {z,} itself, speaking generally, can lack 
convergence; however, if point z, of the minimum of f, (x) with 
z €D is unique, it can easily be seen that x, —2z,. Unfortunately, 
the rate of convergence of the method of feasible directions is as 
yet unknown. 


Construction 
of the Initial Approximation 


The application of the method of feasible directions requires the 
knowledge of the initial approximation in region D. To obtain 
this initial approximation we can use the same method of feasible 
directions by applying it to the problem of minimization of number 
with constraints 

f,(z)—yn<0, i=1,...,m, Axr—b=0. (2.22) 
As there is a point x such that 
f, (xz) <0, i=1,...,m, Ax—b=0), 


the minimum value of 7 with the constraints described is strictly 
Jess than zero and therefore after a finite number of steps we obtain 
point x and y such that yn <O and the inequalities (2.22) will be 
salishied. This means that the obtained point z satisfies the con- 
straints of the original problem and can be taken as the initial point 
for applying the method of feasible directions. 


3. METHOD OF CONDITIONAL GRADIENT 
AND NEWTON’S METHOD 


The method of conditional gradient can be used for solving the 
problem of the minimization of a nonlinear function in a region in 
which the problem of the minimization of a linear function can be 
solved without great difficulties. 

Suppose that f (x), x € E” is a continuously differentiable function 
in 4 compact convex region Q, and the gradient /’ (x) of function f (z) 
in @ satisfies Lipschitz’ condition, i.e. 


If (a1) — fF (ta) IS Lila, — ell (3.1) 


for all the points of region Q. 


170 


METHOND OF CONDITIONAL GRADIENT 


The method of conditional gradient consists in the following. 

Let x,, the approximation at the k-th step of the iterative process, 
be already constructed. Calculate f’ (x,) and find the minimum point 
of linear function (f° (z,), 2) in Q. Let it be point z (x,). Take p, = 
= 2 (7,) — Xp and Xp4, = FX, + yp, Where a, > O is the length 
of the step in direction p,. Point 7,4, is taken as the initial one and 
the procedure is then repeated. 

It will be demonstrated below that with a definite rule of calculat- 
ing @,, the process converges and the bounds on the rate of con- 
vergence will be established. The same problems will be analyzed 
for Newton’s method which differs from the method of conditional 
eradient in that the function being minimized is approximated at 
each iteralion by a quadratic form (while in the method of condi- 
lional gradient the approximation is linear). 


Rule for Choosing the Step Length 


Let x be an arbitrary point in 9. We denote by z (z) a minimum 
point of function (f’ (z), z) in & such that 


(f* (x), 2 (z)) < (F(z), 2), 2 € @. (3.2) 
We take p (zr) = 2 (zx) — 2a, 
n (2) = min (7 (x), 2 — 2) = (f (2), P (2). 
By (3.2), 9 (x) < O. We are interested in the estimate for the increase 


of the function value in moving from point z in direction p (2). 
Using Taylor’s formula and (3.1) we obtain: 


j (2 + ap (z)) = f(t) +a (f (8), p (2) 
= f(z) + a (f (x), p (z)) + & (f (8) — fF (2), p (2)) 
< f(z) + @y (2) + @*h | p (IP 


where 0 = 2+ bap (zr), OXE<1. Thus 

f (x + ap (z)) <f (z) + @ (q (z) + all p (z)|I°). (3.3) 
It follows directly from this formula that with 
the following estimate holds: 


f (aap (2))<f (2) +A (3.5) 
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Description of the Algorithm 


The algorithm begins with an arbitrary point z, of region Q. 

We describe now the general step. 

Let point z, be already constructed, k > 0. Having solved the 
problem of minimization of (f’ (x,), 2) in Q, calculate z (2), p (zx), 
y (z,). Construct point 2,4, = 2, + @,p (x,), Where a, is taken 
to be 2-*e and i, is that of the indices i = 0, 1, ..., which is the 
first to satisfy the inequality 


f (an t- 2-tp (an) <f (vn) + 2-§ DEH | (3.6) 


The condition of the halt: the process stops if ny (z,) = 0. 


Substantiation of Convergence 
of the Algorithm 
and Estimation of Its Rate of Convergence 


According to the just given rule of choosing the step length, the 
following inequality is fulfilled: 


f (i+1) <f (ap) +See | (3.7) 


In order to substantiate the convergence of the algorithm, it is 
necessary first of all to demonstrate that inequalities (3.6), (3.7) 
can always be satisfied. In fact by (3.4) and (3.5), inequality (3.6) 
will be satisfied as soon as inequality 


_j 1 (rp) 
y) ba ALR 
2 LI p (ze) IP 


is satisfied, and since i, is the first index satisfying (3.6), we have 


—~(j.— { 1) (Zp) 
9a, -- 270-7 Py LITR 
h >—Z Tipit * 
hence 


1 — 1 (Zp) 
Ot > FE Tp en) IP (7.5) 


It follows from the foregoing that if y (z,) < 0, then inequality (3.6) 
will be satisfied after a finite number of trials and the a, chosen will 
satisfy inequality (3.8). 

Lemma 3.1. /f {z,}, k = 0,1, ..., is @ sequence of points obtained 
in implementing the algorithm of the method of conditional gradient, 
then x, € Q, f (xp) decreases monotonically and y (x;,) +~QOask — +00. 
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Proof. Let z, €Q for k <m. We show that z+, € 2. Indeed, 
O<a, <1 and z (z,) € Q. _ Therefore 


Lm+1 — Im + &m)P (Xm) = Im + am (z (2m) — Lm) 
= (1 —'Qm) Lm + Amz (2m) E Q, 


for region & is convex. 
Note now that || p (z,)|| has a limit, a certain constant C, since 
p (xz) = 2 (x;) — Lp, 2 (zp) € Q, xr, E @ and @ is a compact set. 
Using now formulas (3.8), (3.7) we obtain 


4 ; 
f (a4) — f (en) — Baw W (22)- (3.9) 
Adding (3.9) for all k=O, 1, ..., m—1 we obtain 
m—i 


f (2m) —f (0)<—grow Di (aa). 
k=0 


Since region Q is compact and function f(z) continuous, we 
have f (tm) = fz, where f, is the minimum value of f (z) in Q. There- 
fore 


m—i 


21 0 (2) S8LC? Lf (40) — F (tm) << BLE? Lf (0) — Fel. 
Hence it follows that the series 
>) 7 (2x) 
k=0 


converges. This is possible only if yn (z,) 0. The lemma is proved. 

It follows from the condition of the halt of an algorithm and 
lemma (3.1) that, in general, either the algorithm stops after a finite 
number of steps and the condition y (z,) = 0 is fulfilled or a mono- 
tonically decreasing sequence of f (z,) values of function f (z) is 
obtained. 

In the first case, the condition yn (z,) = 0, by (3.2), is equivalent 
to the following one: 


(f° (tn), Ze) = (Ff (fe), 2 (Zn) S (F" (Ze), 2), 2EQ. 


The last expression is nothing else but the necessary condition for 
function f (x) assuming its minimum value at point zx, (see Chap. I, 
Sec. 3). 

The second case is the subject of the following lemma. 

Lemma 3.2. At any limit point of sequence {Z;}, k = 0, 1, 
the necessary conditions for the minimum of f (x) in set Q are fulfilled. 
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Proof. Let x, be a limit point of {z,}, i.c. there is a subsequence 
{rp}; j + oo, such that 2, > 2x,. The following relations hold: 


(Tr) =(' (Taj)> 2(Te;)— Zn ,), 
(F(a )> 2(taR)) SF’ (Za), 2), ZE®. 


Without loss of generality, we can take that z (z,.) ~2,. Since 


1 (x,) 0 and f’ (x) depends on z continuously, it follows from the 
above relations that 


(f° (ty), 2% — Ly) = O 
(f° (ty), 2%) <(f' (4g), 2), 2 € Q. 


(f' (ty), Te) <(I" (Te), 2). 2 EQ, 


and this is the proof of the lemma. 
Theorem 3.1. Let function f (x) be convex. Then 


where f, =minf(z). Moreover, the estimate 
xEQ 


Ilence 


C 


(2h) -he SF 


(C is a positive constant) holds. 


Proof. As f(x) is a convex function, the following inequality 
holds: 


f,—f(z)=f (2%) -FaM2P (2), 7, —7) min (f (x), z—2) =n (2). 
Thus 0<f(z)—f,<—y(z). Therefore with all k 
O<f (tx) —fx< —N (Za). (3.10) 


It follows from lemma 3.1 that y (x,) 90. Therefore, the last in- 
equality shows that f (z,) f/f, and this proves the first part of the 


theorem. 
Combining (3.9) and (3.10) we obtain 
(f (nsx) — fa) — (F (a) — fe) < —azge Uf (an) — fF 
With the notation @, =f (z,)—f, we obtain 
PrtaS Fi (4 — shir) 


or 


1 
Praise Pr (4 — Pn); x = SLC? ° 
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Taking Cp = we obtain 


Vati — k-+1 Vi ; 
% SR (1x) (9-11) 
or 
Vriy {4 x“(k-+1)y, 
Vp <1+73- ke 
With each & there are two possible cases: 
(1) THI <4, LC. Pras SVa- 
(2) Att 4, 
VR 
Then x at VY, > 0, i.e. 


I 
—_ 


1 &k 
Vay A+ 14 <= 
Further, from (3.11) we obtain that 


Vit k-+-4 
— ~<a <2 
k 


with A 2 1. Now only two situations are possible. 
(1) There is only a finite number of indices k& for which y, <a. 


Then due to the above statements for all great k sequence {y,} does 
not increase monotonically, i.e. remains bounded. 


(2) There is an infinite number of indices & for which y, <i. 


We shall denote the set of such indices k by ¥ so that y, <— for 


k€ 3. Let now j € ¥. Then there will be two indices k,, k, € ‘¥ such 
that hy <j <k, and k€ ¥ for all ki <k <k,. Then 


) 
Vert S2Va 
and Vj,,< 7; for all 7=k,;+1, ..., kA,—1. Therefore 
2 e7- i 
VWSZ- ICer- 
This shows that in this case as well the sequence has a limit, a cer- 


tain constant C. It follows that 


al 


tk CC 
i, = k < k 


and this completes the proof of the theorem. 
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The estimate obtained shows that the algorithm converges not 
very rapidly. But we have obtained the upper bound and it can seem 
that in reality the algorithm converges at a faster rate. Unfortunate- 
ly, it is not so in general. As shown by M.D. Canon and C.D. 
Cullum, the bounds obtained are precise if the function being min- 
imized is convex on a_ polyhedron. 


Estimate of Convergence 
for a Strongly 
Convex Region 


Let region 9 be strongly convex, i.e. there is a number 6 > 0 
such that for any z, y € 2 points cu w belong to Q for all 
the w such that || w ||< 6 || z — y |?. Then 
(zx) =min (f' (x), z—2z) <= min (f’ (x), <* + w—z) 


z€Q [| w <5 | 2(0) <4? 


<+(f' (2), 2 (2) —2)—8 || (2) —2|P INF’ (2) I. 


Hence 
> y(z)x< — 4] f' (x) || 12 (@)—= I? 
or 
{ 
> eS é lf (x) || (3.12) 


Theorem 3.2. /f f (x) is a convex function and region Q is strongly 
convex and.if || f’ (x) || > &, > 0 for all x € Q, then the method of 
conditional gradient converges at the rate of a geometric progression, 
L.e. I Lh — Lx | < Cq: doa 1. 

Proof. From (38.7) and (3.38) we obtain 


{ 2 (x1, 
Pe— Pes =F (2H) —f (my) Sop TE 
Using (3.12) and (3.10), we obtain 


_ 4 5 
Pr— Pr+t ATL b&) (— (x)) = = Cr, 
1.e. 
d&o 
Pris ( 1 —=+) Pr: 
Therefore 
Om—IT' Pos q = 1— Sra <= 1. 
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Because of the necessary and sufficient conditions for a minimum, 
we have 

(f(y), © — Ly) > 
Therefore for all the w, || w || <5 || z — zy ||’, 


(f' (z,); ats +w—z,)>0, 


Iience 
4 (7 (2), 2—2,)>8 |] 2—2, PALL’ (4) Ul 


But || f (zg) || 2 €& and since f (z) is convex, we have 
f (x) — f (ty) & (f' (Ty), L — Ly). 
Finally, we obtain 
f (xz) — f (zy) S 26e, || c — zy, |I*. (3.13) 
Hence 
4 \4/2 
zx — 2, I< (se-} Po -\" (qi/2)F 
With the notation 
ff G \1/2 (4. 880 \ 1/2 
C= ( Be5 ) > o= (1 AL 
we obtain 
| Lp — Ly |< <C qi. 
Q.E.D. 


Newton’s Method with Step Adjustment 


We consider now the problem of the minimization of a convex 
smooth function f (x) in (a convex, compact) set Q. 
For solving this problem the iterative process 


Thty = Tp + Onpr, Gp > O (3.14) 


can be used in which the direction of motion p, — x, — 2, is the 
solution of the problem of minimization in set Q of the quadratic 
function 


bn (2) = (f" (a), ©—2n) + (f" (a) (@— tp), ©— 2p), 


and as a, we take the maximum value of parameter a, obtained by 
reducing the initial a = 1 until the parameter satisfies the inequal- 
ity 

f(z, + app) — f (te) < Earp, (22), Ore <tt. (3.15) 
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Parameter a, can be chosen by other methods analogous to those 
described in Sec. 2, Chap. II (methods (2.2), (2.3)). 

We shall ascertain below that the rate of convergence of Newton’s 
method under definite conditions is either superlinear, or quadratic. 
Consequently, if the problem of the minimization of function 
Wp (x), x € Q can be solved easily enough, then Newton’s method 
proves very effective. 


Properties of Newton’s Method 


Theorem 3.3. /f for the minimization of a convex twice continuously 
differentiable function f (x) in a convex closed bounded set Q we use 
method (3.14) in which a, and p, are determined as described above, 
then (whatever the initial approximation x, € 9 chosen): 

(1) f (x,) decreases monotonically, 


(2) Tim f (2x) =f (#4) = min f (2). 

Proof. There is a minimum point x, (possibly not unique) of the 
continuous function tf, (x) in compact set & (Weierstrass’ theorem). 
With any & point 2p4,€& since %p44=2%,+ Op (Tp — Ty) = On rT, + 

+ (1—a,) z, and a €[0, f]. Since function pn () iS Convex, we 
have pp (©n41) = Wr (na + (1 — Ap) Tx) nth, (Te) 4+ (1 —@n) Wr (La). 
But , (z,) =O, therefore 


Dr (Tati) < On Pr (Zp). (3.16) 
Now making use of meyions rormula and of (3.16), we have: 
f (Xn+1) — f(a) = Pr (ati) = oh OF yr, Ph) 


< ap ok | Fr || Il Pr I{- 3.47 
Santa (aa) (14) AD 
where Fy = f" (nc) — fF (fn), Lee = Le + O (Tati — Zz), 9 € [0, 1]. 
It follows that if ; (z,) 0 (in this case wp, (z,) <<, (xx) = 9), 
then with a certain a, > O the inequality 


1-4 apy || Fp || ll Pa ll* >e (3.18) 
Uh (rR) 


holds. But at the same time the inequality (3.15) holds as well and 
this proves that the described method of L gnoosing a, may be applied. 

It follows from (3.15) that f (7,41) < f (z,). We show now that 
Wp, (tx) > 0 as k + oo. In a closed bounded set Q continuous func- 
tion f’ (xz) has an upper bound: || f’ (v7) ||<M. Consequen- 
tly, || F, || <2M, and vector p, has an upper bound too: 
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|p, || < max |lz—y\||=d. Suppose that with any & 
x, YER 


we have wp, (zx) < — 6 <0. Then 


{ Gn MPa Wie WP 4 Md 
v3 Pr (Zr) an 


and, hence, inequality (3.18) (and therefore (3.15) too) is always 


satished even with a, = c- a = C>90. But it follows from 


(3.15) that at the same time f (z,4,) — f (t,) < — eCB with any k 
und this contradicts the fact that in compact set 9 function f (xz) has 
a lower bound. 


Thus the condition p, (z,) < — 6 with any & cannot be fulfilled, 


i.e. in any case as k + oo the condition , (z,) —O must be fulfilled. 
This means that at any limit point of sequence (3.14) the necessary 
(and sufficient, because of the convexity of f (zx)) condition for a 
minimum of function f (xz) in set Q (see Chap. I, Sec. 4) is fulfilled. 
Taking this into account, the last assumption of the theorem can be 
proved as in theorem 3.1. 

The theorem just proved shows that in the problem under consider- 
ation, as distinct from unconstrained minimization problems when 
Newton’s method can be applied only to minimize strongly con- 
vex functions, this method can be applied to minimize also convex 
functions, since set {2 is bounded. However, the application of 
Newton’s method to the minimization of strongly convex functions 
is of the greatest interest, for just in this case the method converges 
to the solution at a fast rate. 

Theorem 3.4. /f in addition to the conditions of theorem 3.3, function 
f (x) is strongly convez, i.e. 


milyll*’<f@y y<M ily ll’, m>O0, zEQ, 
y € B", (3.19) 
then sequence (3.14) converges to the solution at a superlinear rate (i.e. 
the estimate (2.5), Chap. II, holds). 


Proof. The existence and uniqueness of the solution of the problem. 
under consideration follow from the general results of Sec. 3, Chap. I. 


At point x, the.necessary condition for a minimum of function %, (z) 
in set 2 (Sec. 4 of Chap. I) 


(tr (Zn), Le — Tp) <0 


is fulfilled, i.e. 


(f’ (tn)y Ze—Zn)+ (f" (Ln) (Za— Lh), Tr — Tr) <. 
Jence 
(f’ (Zn), Pr) < — (F (Ln) Day Pr). (3.20) 


179 42* 


CONSTRAINED FUNCTION MINIMIZATION 


Taking into account this and the left-hand estimate (3.19), we find 
that 


tbe (Tr) << —F Il Pe IP. (3.24) 
Using this estimate we obtain from (3.17) 


f (tas) —f (@a) Santa (Ze) (1-AEET), (8.22) 


Since p, (z,) +0 (theorem 3.3), it follows from (3.21) that || px || > 
(0 as k -— oo. Hence, because of the uniform continuity of the 
second derivative f” (x) on set Q, we have that || F;, || ~ 90. But (3.22) 
implies that from a certain k = JN, (e) on, inequality (3.15) will 
be satisfied with a, = 1, i.e. method (3.14) is transformed into the 
usual Newton’s method with a unity step. With k > N, (e) taking 
into account the convexity of , (x), we obtain 
De (Le) = We (Ta41) > (f" (La), Lata — Tr) 

=(f" (Zr-1), Tati —Te) + (f" (Ta) — fF (Za), Thi — Zp). 
Transforming the last scalar product by means of Lagrange’s for- 
mula for operators (Chap. I, Sec. 5) we obtain 


Wr (xx) = (fl (faa) +H" (@n-1) (Gk — Trt). Te+1— Zr) 
+ (D, (Zp —_ Xr-1)s Lh+4— Lr) (3.23) 
where 
D;, =f" (Tra t+ 9s (Za —Taa))— Ff" (Tr-1), 98 €[0, 1]. 


Note that = (f% (tas) +f" (€n-1) (Ta —2a-1), Tati — Te) = (Was (Za), 
Lpiy— Zp). Since Wy, (%,) = min W,-, (x), we shall have with any 


x€Q 
xE€Q that (,,(z,), 7—z,)>0 (the necessary condition for a mini- 
mum). Consequently, (%p,_,(2x), 2%n+;—2,)=20 holds and therefore 
it follows from (3.23) that 


— Pr (Ze) <M] Op | ee — Tas Il Ita — Tr || 
=|] Dr ll] Il Pas |I Ml Pe ll. (8-24) 
Comparing estimates (3.21) and (3.24), we obtain 


2 || me: I | Ln — Zp |. (3.25) 


Il Zn4+1 — Za ll 


Because of the uniform continuity of f"(z) on set 9, we have 
[| D, || 0. Consequently, there is a*number JN (e) such that with 


k>WN (e) we find A, = al <tSet y <1. Let us take ||z7y—2y-4||=Ci, 
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i-1 
1—Ay=y>O0. Then || 2;—4y+1 I<, 2 ll Tr+s — Zell CAwAnas --- 
= C 
Aner (A + Anges + Avergi ft ... bane < ix Ay «.. 
os Mw4i =Chy eae Nw4+t- Consequently, as L, [— CO, [| 7;§—2Zy+) || —> 0, 


i.e. sequence {z,} is fundamental and, because of the completeness 
of space £”, has a limit 7,€, and 


Il ty+1— 2, IX CAWAnas --~ Andie (3.26) 


By theorem 3.3, 
lim f (_) =f (zs) = min f (2). 


Thus sequence (3.14) converges to the solution and the rate of 
convergence, as can be seen from estimate (3.26), is superlinear. The 
theorem is proved. 

If the second derivative f” (x) on set Q satisfies Lipschitz’ con- 
dition with constant R, then inequality (38.25) takes the form 


(3.27) 


2R 
| rai — Zr |X ™ 


We shall use the notation B, = = || Zp44— Zp ||. Since || 2,4, — zz || —>0, 
then there is a number L(e) such that with k>L(e) we have 
B,.<.1. Taking into account (3.27) with k>L, we have 
k—L 
Br<pii<... <7 
Consequently, for any i>L-+/, l=0, 1, 
i-1 


9k Lb 
| %i — Tp ||< >» Il Fats — Te la a 3 Br = 3 Sot 
k=L+1 k=L+l 


Since rp—>2,, we have || 2,4,— 2, ||= lim ||2,4,— 7; ||, i.e. 
1-» CO 


co 
m 98 
| tr41—2, ISoF >») Br 


sl 


This estimate can be given the form 
| Zr41— 2, I< CBE, C<0 


(taking into account that the series > p2" converges). The estimate 


obtained means that the following theorem is valid. 
Theorem 3.9. [f the conditions of theorem 3.4 are fulfilled and, be- 
sides, matrix f" (x) on set Q satisfies Lipschitz’ condition with constant 
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R, then sequence (3.14) (in which a, and p, are chosen by the method 
described above) converges to the solution at a quadratic rate. 

We shall study now the properties of Newton’s method with the 
choice of a, under the condition of f (x) having the minimum value 
in the direction of motion. 

The argument we used to estimate the rate of convergence of 
Newton’s method in unconstrained problems (Sec. 2, Chap. II) are 
not suited to this case (since the right-hand estimate (1.11) of Chap. II 
does not hold). 

Lemma 3.3. If function f (x) for which conditions (3.19) are fulfilled 
is being minimized and a, in method (3.14) is chosen under the con- 
dition 


f (Ze + %nPr) = min f (T,-+-@Dn), : {(3.28) 


then x, —>X_y and a, 1 as k +o. 
Proof. By Taylor’s formula 
UF (Xn+1) — f (Te) =e (f" (Za), Pr) +—* A (yt (Zre) Pk» Pr). 


With the value of a, satisfying (3.28), the right-hand side of the 
last equality considered as a function of the variable a@ must attain 
its minimum. Therefore, it can be easily ascertained, taking into 
account estimates (3.19) and (3.20), that 


(f' (tr), Pps m. || p, ||? __m 


I2%>——“p, i > Mp, 


Thus a, > C > 0; therefore in the same way as in theorem 3.3 


it can be shown that wp, (z,) —O0, i.e. sequence (3.14) in which a, 
is chosen under condition (3.28) converges to the solution. At the 
same time || p; || —0O and || F, || ~0O (theorem 3.4). 

We shall demonstrate that a, —1: 


F (Lp41) = Pr, "(tn+t) + MF tPr» Pr) 
ee (2p) + (Ph (Te). Tats — Zr) 
+3 — (Wi (Ee) (ta41— Th)» Trs— Tr) + A (F kPh> Dr)- 
Taking into account that 2,4, — Zz, = (a, — 1) pp we obtain: 
f (Ta+1) = "br (Ta) + (Ph (Tu), Tats — Zr) 
+> a (Pa (Th) Pr, Pr) +> SEF pPas Pr)- 
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Note (tpi (re), Tera — Ty) SO, (tp (Zp) Pry Pr) = (f" (Te) Pas Pr) > 
=m || pp |. At the same time (Frpp, Pr) = 0 (Il De Il”), 
(since || F; || —O as || pe || —> 0). Consequently, the minimum of 


the difference f (z,4,) — W, (x,) can be attained only as a, —>1; 
otherwise, with any k we should have 1 — a, >B > 0 and at the 


same time f (2,41) — Wx (xz) = O (Il ||?) > 0, whereas witha = 1 
. — — 1 . 
the difference f (x,) — Pp (Lx) = x (FaPr, Pr) = 0 (I Pa I’), i.e. 


with a sufficiently great k we should always have f (z,) < f (7,44); 
this contradicts the condition for choosing a,. The lemma is proved. 
Theorem 3.6. /f function f (x) satisfies the requirements of theorem 
3.4 and in method (3.14) parameter a, is chosen under the condition 
(3.28), then x, 2, at a superlinear rate. 
Proof. By estimates (3.16) and (3.21), 


Yr (Tht) SOy De (Za) SOAR tp (Ta) << ——* ml Pa lP, 


De (Tass) S —F | Tas — Ze IP. (3.29) 


On the other hand, = pp, (fati) 2 (ff (2x), Tata — Tr) = 
= (f’ (te), Let, — Tr-1) + (f (Te), Za-y — Tp). Since at point z, the 
minimum of f(z) is attained in the direction 2x,_; — 2,4, we 
have (f’ (t,), Lp-y — Zp) = 0. Hence, we have 
Wr (Tati) = (F (La). Tati — Lp) 

= (f! (Gat), Cast — Tat) + (fF (Ge) — fF (Te-t)s Ta1 — Tp-4). 


Kxpressing the last term on the right-hand side by means of La- 
grange’s formula for operators, we obtain after some transformation 


Wr (Lrts) (FP (Lpas) + Hf (Fant) (4p — Tea), Tri — Lp-1) 


k + (Dix, — XLn-4), Te+i— Lp-1) 


where @ k= f’ (1,4 + 6 (Xp — Lp -1)) — f" (2p-1), § € [O, 1]. Tak- 
ing into account that x, — 2,-, = Gp-, (X_x_] — Zp-1), Wwe find that 


We (Lros) = (F" (Laat) + fF (2-1) (Lp — Zp-1), Tati — Ln-1) 


+ ((@,-1 — 1) f’ (Lp-1) (Trot — Zp) + Dp (Lp — Tp-s), Thai— Zp-1)- 


Since 


Wr-1 (2p-1) =min tp,-; (2), 
xEQ 
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we have (bx_1 (tx-1), Zp-y — XZ) <O for all x €Q (the necessary 
condition for a minimum). Consequently 
(pps (Zp-1), Tp — Ln+1) 

=(f" (Tr) +f (Zr-1) (Zp —Zp-1))  Ta-1— Tati) KO. 


Taking this into account we see that the following estimate holds: 


Wa (avs) > ( (PAS faa) + Ox) (Ce Ae4) Tats — Fe). 


oF 
Hence, with the notation || [(a,_, — 1)/a,_,] f” (xx_-1) + Px || = Op, 
we obtain 
— Wr (Lp+1) S Op || Le — Lys || || Teri —Xp-4 | 
< bp || Zp — Zp-1 |] (|| Sari — Tr [| || Lay — Tx ||)- 


Taking now into account that 
Lp — Lp = ete (X, — Tp-4) 
and denoting [(1 — a,_,)/a,_,] b, = cp, we obtain 
— Wr (Zp+1) Sp || Tz — La-s || |] Tat1— Te || +p || Te — Fay IP- 
Since a, —41, || p, || ~0 (lemma 3.3), we have b, —0O, c, — 0. 
Comparing the estimate obtained with (3.29) we establish that 


Il Zeta — Zp || * < En ll te — TR-a II || Ta+1 — Zp || +n || e—Za-1 ||? 
where 
al __ 2cp 
o.= 0, = ae 


Finally, having solved the quadratic inequality obtained for 
| Za4. — Zp ||, we find that 


ll Tata — Tr || S pe || ZR — Tr- || 
where 


$1 VTi 0s kew 


The remaining part of the proof is performed just as in theorem 3.4. 


4. CUTTING HYPERPLANE METHOD 


The cutting hyperplane method is meant for solving problems of 
convex programming. The basic idea of the method is that the ad- 
missible domain is approximated by a certain polyhedron which 
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diminishes from one iteration step to the next giving a better and 
better approximation to the admissible region about the solution. 

The method is applied to the problem of convex programming 
formulated as follows: to minimize fy (x) = (c, x) with the constraint 


f(z) <0 (4.1) 


where f (z) is a continuous convex function. 

The fact that function f, (z) to be minimized is linear and the 
constraint (4.1) consists of only one inequality is of no great con- 
sequence, since if the region is defined by several inequalities 


f; (2) <0, i=1,...,m (4.2) 


with convex functions /; (x), then the system of inequalities can be 
rewritten in the form (4.1) setting 


{(e)= max fee) 


If convex function f, (x) is nonlinear, then by introducing an addi- 
tional coordinate x"*' and adding the inequality ; 


far (2, 2") = fy (2) —a™4<0 


to the system (4.2), we can reduce the problem to the minimization 
of the linear function of zx"t! with constraints (4.2). Therefore, we 
shall study the problem in the form (4.1). Before going on to the 
description of the algorithm we remind the reader that vector a 
is a support vector to f (x) at point x, if f (x) > f (z)) + (a, x — Zo) 
for all x. It follows from the results of Sec. 2, Chap. I that for a 
continuous convex fuction the set of such vectors is nonempty at 
any point of space. 


Algorithm 


Let 
Q = {x: f(r) <0} 


be a nonempty admissible region. Suppose also that Q is compact and 


vectors such as a,, k= —l, — (1 — 1), ..., —1, 0 and numbers 
b, are known and that region 
S = {r: (a,, x) —b, <0, k = —l, ..., 0} 


is compact and contains Q. 

For k > 0 the successive approximations are determined by the 
following rule. We set S, = S. If S, has been constructed, then x, 
is any solution of the problem of linear programming: to minimize 
f, (x) = (c, x) with z € S,. The next region S,4, is constructed by 
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the following rule: 


Srtr = {2: (Qpti, 2) — bats [O} N Sy (4.3) 
where a,4, is a support vector to f (x) at point x, and 
Onis = (Anti, TR) — f (zx). (4.4) 


It follows from (4.3) that S,4, CS; and for k > 1 
S, = {z: (aj, x) — 6b; <0), 
j — —l, eo 0 09 —1, Q, 2° 8 09 k — 1}. (4.5) 


Lemma 4.1. For all k>1, Q CS,. 
Proof. Let x € Q, i.e. j(c) < < 0. Then 


f (x) & f (tj) + (aj, & — 234) = (aj, 2) — 5; 


and, consequently, (a;, x) —b,; <0, j=1,..., k&. With} <0 
the last inequalities hold by virtue of choosing a; and b; forj < 0. 
The lemma is proved. 

It follows directly from lemma 4.1 that 


fol(Zo) S fo (41) Se - KS fo (Ln) S fo (Tat1) KS 


On the other hand, if z, is the minimum point of f, (x) in Q, then 
fo (tx) < fo (Te) since S, D QQ. 

Theorem 4.1. Let f (x) be a continuous convex function, region Q2 
be compact and there be a number K such that with each x € S vector 
a which is a support vector to f (x) at point x satisfies the inequality 
| a || < K. Then any limit point x, of sequence {z,}, k = 0, 1, 
is a solution of problem (4.1) and f (x,) 0. 

Proof. Since Sy = S, Sy > Sz41, the whole sequence {z;,} belongs 
to the compact set S. Therefore there are always limit points of this 
sequence. 

Note now that if f (z,) < 0 for a certain k, then z, € Q and, con- 
sequently, fo(xz,) = fp (zx .). However, as was shown, f, (x) < 
< f, (z,). Thus, ty (2p) = fio (2%); i.e. xz, is the solution of the 
original problem. 

Let now sequence {z,} be infinite and f (x,) >O for all k. We 
shall show that f (z,) —0O. Suppose that the contrary is true. Then 
there is a number r > 0 and a subsequence of indices & (we denote 
it by J) such that f (xz,) >r,k € J. Without loss of generality, we 
can take that z, >2z, k € J since {z,} belongs to a compact set. 

Let now & and j belong to 7 and &k > j. Then by construction, 
point zx, satishes the inequality 


(Qj+4, ZR) — by = (Qj41, Zp — Zi) + f (2) < 0 
hence 
f (xj) S (Qj4y, FT] — Ty) SK || rT] — Zz |]. 
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But {z,}, k € J converges to x and therefore || x; — 2, || <r/ (2K) 
for all sufficiently great &# and j and so f (z;) < r/2 for great j, and 
this contradicts the fact that f (z;) >r,j€ 7. 


Thus we have shown that f (z,) > 0. Let now z be any limit point, 


i.e. 2, >2z,k € J, where Jisa subsequence of indices. Then because 
of the continuity of f (2), 


f (x)= lim f (xx) =0, 


i.e. c€Q. On the other hand, fp (tx) < fo (24) and therefore 


fo (2) < fy (te); it follows directly that fy (z) = fp (xq) and z is 
also a solution of problem (4.1). The theorem is proved. 


Computational Aspects 


The algorithm of the cutting hyperplane method requires at each 
step the solution of the problem of linear programming: to mini- 
mize f, (xz) = (c, x) with constraints 


(a;, x) —b,<0,i=-—l,..., k. (4.6) 


Thus the size of the problem being solved increases at every step. 
The computer memory required for storing vectors a, also increases. 
In order to simplify the solving of problem (4.6), it is expedient to 
solve instead the dual problem which in this case takes the following 


k 
form: to maximize — >) w‘b; with constraints 
i=-l 


k 
>») wa;te=0, u'>0, i=—l, ..., k. 
i l 


i=- 


In solving this problem by the simplex-method the solution of 
the preceding problem serves as the trial solution for the next one. 
It is also a good approximation so that the solution of the new 
problem is obtained after a small number of iterations. 

At each step of the algorithm it is necessary to compute vector 
Q,4,, Support vector to f (x) at point z,. Recall (see Chap. I) that 
if f (xz) is a differentiable function, then a,4, = f’ (z,). If f (z) is 
the maximum of the functions being differentiated, i.e. f (z) = 


= max f; (x),, then we can take as a,,, any vector of the form 
i<i<m 


2 Afi (zn), whereZa; > 0,9 > A= 1, 5 (re) = {i fi (or) = 
Ef (xp ) 1€ J (x,) 
= f (x,), 1 <i< my); in particular, we can take a,4, = fj (Zp), 


where i is any index from J (z;). 
The foregoing follows from what was set forth in Sec. 2, Chap. I. 


187 


CONSTRAINED FUNCTION MINIMIZATION 


Concluding Remarks 


In describing the cutting hyperplane method we followed 
J. E. Kelly’s paper. At present there are many modifications of this 
method. They can be found in a paper by E.S. Levitin and 
B. T. Polyak. However, all these modifications do not seem to enhance 
the main property which is of interest to us, viz. the rate of con- 
vergence which has not been precisely estimated for the method 
described; however, the results obtained in the paper mentioned 
permit to form the judgement that this rate is not even that of a 
geometric progression. 


0. LINEARIZATION METHOD 


In this section we shall describe the method of solving the general 
problem of mathematical programming without making any assump- 
tion concerning the convexity of the functions to be dealt with. An 
important property of this method is the possibility of taking into 
account nonlinear equality constraints, this being a stumbling- 
block for most other methods. 

It is required to minimize function f, (x), z € E” with constraints 


fi(z)<0, i€S, fi(z)=0, te T° (5.4) 


where J- and J° are finite sets of indices. We assume that all the 
functions f; (x) are continuously differentiable. (More fully the con- 
straints with which the problem is considered will be specified below.) 
At point z,) we substitute linear constraints for all (5.1) and a lin- 
ear function for f, (x) by linearizing /; (z) at point x. As a result we 
obtain a problem of linear programming. It would be natural to 
take the solution of the linearized problem as the next approxima- 
tion as we do in Newton’s method for solving systems of nonlinear 
equations. Unfortunately, this way does not lead directly to the 
aim since as a rule the subsidiary problem of linear programming 
has no solution. Therefore, it is necessary to impose certain con- 
straints on the increase of vector z at x, in order that the solution 
of the linearized problem should not shift too far from x, and should 
remain in the neighbourhood of x, such that linearization would 
still hold at it. This will be performed below by adding a quadratic 
term to the linearized objective function. 

Note that each of the equalities f; (xr) = 0 is equivalent to the 
following two inequalities 


fi(z)< 0, —f; (x) <0. 


Therefore we can limit ourselves to considering only the case 
with inequality constraints. Such a constraint is convenient at least 
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in the theoretical substantiation of the algorithm though the doubl- 
ing of the number of inequalities can be inconvenient in calculations. 
We shall give below the theoretical substantiation of the algorithm 
for the problem of the minimization of f, (x) with constraints 


fi(z) <9, t€ 7. (5.2) 


A modification of the algorithm for the general problem (5.1) will 
be considered separately. 

Thus we shall study the algorithm for problem (5.2) without loss 
of generality. Clearly,we can always assume that among the inequal- 
ities (5.2), there is the trivial one: 0 < QO. Therefore it will be assumed 
that among the functions f; (z), i € J there is one identically equal 
to zero: f; (xz) =O. 


Basic Assumptions 


We set 
F (x) = max f; (2) 
ig 


J9 (x) = {iE ST: fi (x) SF (x) —8}, 550. (9.3) 


By assumption, F (zr) > 0 with all the x. Suppose that there are 
constants V >0, 6 > 0 such that: 
(a) the set 


Qy = {z: fo (x) + NF (2) S Co}, Co = fo (20) + NF (2p) 
is bounded; 


(b) the gradients of functions f; (z), i€ {0} U J in Qy satisfy 
Lipschitz’ condition, i.e. 


I fi (xa) — fi (2) WSL May — ze Ih 
(c) the problem of quadratic programming 
min (f,(z), p) ++I ple, 
(fi(2), P) +i (2)<0, i€Fs(2) (5.4) 


is solvable for p € E” with any z € 2, and there are Lagrange mul- 


tipliers u' (x), i € Jg(x) such that >) wu’ (x) < N. In this section, 
tEY g(x) 
[| p || will always denote the Euclidean norm of vector p. 
In what follows we shall denote the solution of problem (95.4) 
by p (x) and Lagrange multipliers by wu‘ (x), i € Js» (zx). 
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Formulation of the Algorithm 


Let x) be the initial approximation and we take e such that 
O—e <1. Let point x, be already constructed by the algorithm. 
The construction of the next approximation will be performed in 


two stages: 

(1) We solve problem (5.4) with xz = 2, and find its solution, vec- 
tor pp = p (Zp). 

(2) ,.We find the first of the values of i =O, 1, ..., satisfying 


the inequality 
f (2n-+ = px) + NF (2.-+—— Pa) <I (x,)+NF (tn) —— ell Pa IP 


If this inequality is satisfied for the first time with i = ip, then we 
take a, = 27, Xp4, = 2, + OnDh- 
Thus the following? inequality is satisfied at each step: 


f (Lay) + NF (laa) <f (Za) + NF (Tp) — &pE || Dr |I?- (9.9) 


Convergence of the Algorithm 


We show that the choice of the step a, at each iteration is per- 
formed after a finite number of successive halvings of unity and 
substantiate the convergence of the algorithm. 

From the results of Sec. 3, Chap. I it follows that p (z) is the so- 
lution of problem (5.4) if and only if there are u‘ (x7) >0, i € Js (2) 
such that 


f(z) +p(z)+ 2) u'(z) fi (zx) =, 
1€ 7 (=) 
u* (x) (fi (2), p(x)) + fi (z)) =0, i€ Fg (2). (5.6) 


Therefore 


(Jo (2), P(z))= — a w* (x) (fi (x), P(X)) Il P (2) IP 


E77 5(x) 
= > w(2)fi(x)—ll p() IP. (5-7) 
1€ Y g(*) 

Lemma 9.1. /n order that point x satisfy inequalities (5.2) and the 
necessary conditions for the minimum of fy (x) with constraints (9.2), 
it is necessary and sufficient that the equality p (x) = O be satisfied. 

Proof. Let point x satisfy (5.2) and the necessary conditions for 
the minimum of f, (z). Then there are numbers u‘ > O, i € J such 
that 


fi (z)+ 4, u'fi(z)=0, u'f;(xz)=0, icZ. (5.8) 
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Ii x satisfies (5.2), then F (x) = O and therefore J, (x) coincides with 
the set of i for which f; (x) = 0. Besides, from the second of relations 
(0.8), u’ = O if f; (x) <0, i.e. if iC J, (x). Therefore taking into 
account that J»5 (4) => Jo (x) we can rewrite (5.8) in the following 
form: 


fo(z)+ Dd wfi(z)=0, u'f;(z)=0, i€ Fo (2). 
1€ 7 5 (*) 


The comparison of the last expressions with (5.6) shows that vector 
p = O is the solution of problem (5.4) for with p = 0 all constraints 
(9.4) are satisfied (since (5.2) are satisfied) and the fulfilment of 
(0.6) with p = QO is the necessary and sufficient condition for vector 
p = 0 to be the solution of (5.4). 

Let now p (x) = 0. This means that the constraints of (5.4) are 
satisfied with p = Q, i.e. f; (cz) < 0, i€ Js (x). Since for i € Js (x) 
we have 


fi(v) SF (zt) —O Sf; (x) SO 


where j € #» (x), then point z satisfies all constraints (5.2). Besides 
with p = O relations (5.6) are transformed into (95.8) if we take 


u’ = 0, i € Js (x). Thus, the necessary conditions for the minimum 
of iy (x) with constraints (5.2) are satisfied too and this completes 
the proof. 


Let us now estimate the changes of all the functions comprised 
by the problem with a shift from point x; in direction p,. 
For i € Js (z,) using Taylor’s formula we obtain: 


fi (a, +apn) =fi (2x) +a (pr, fi(te)) +0 (Pr, fi (8:1) — fi (7n)) 


where 0; = x, + @Eipz,, OX E; <1. Since p, is} the solution 
of (5.4) with z = 2,, we have 
fi (Zp + &%Pr)S fi (Ta) —Ofi (Ln) + O* || Pr [PL 

<(1—a) fi (t,)+@*|| palPL (9.9) 
in deriving which we made use of the fact that the gradients of f; (x) 


satisfy Lipschitz’ condition. 
For i € J 8 (xp), 


fi (tp + app) = fi (Ze) +. (pe, fi (93)) 
< F (xz) —85 + aK || pz || (5.10) 


where K is a quantity which bounds || f; (z) || in Qy. 
Since 


(1 — a) F (a,) > F (x,) —6 + aK || py || 
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for a@ such that a <1, 


6 
°S*S Fe) FET’ eN 


it follows from (5.9) and (5.10) that for all the i the following in- 
equality holds: 
fi (fe + Opp) S (1 — a) F (xy) + aL |I pa IP (0.12) 
provided o@ satishes condition (5.11). 
By analogy to the preceding: estimates 
fo (x + pn) = fo (Xn) + (Dr, fo(Ln)) + & (Pes fo (90) — fo (Xx), 
Oo = Xp + aEop,, VS Eg <1. 
Using (5.7) and Lipschitz’ condition for gradients we obtain 
fo(%n + &Pr)< fo (Tr) +o (Pte u* (xp) fi(tn)) —@|I Pa |P+ @7L || Pr |. 
Hence and from (0.12) it follows that 
fo (tr + Opp) + NF (Lp +4Pp)<fo (Ln) + NF (zp) 


+a( pa Ue (ta) fi (£,) —_NF (rx)) —@|I pr |? 


EJ 6l~p 
+a2(N+4)L||palP. (5.43) 
Recall now that wu*(2,)>>0, F(2,)>>0 and 
>) u* (t,)<N. 
1E 5 (=p) 
> wu (ax) fi (tr) — NF (t,) <0. 
1E Fg (*p) 
But then (5.13) can be written in the form 
fo (Zu + app) + NF (2p + app) < fy (Lx) + NF (2x) 
— a || Pp |’ (1 —a (NW + 1) ZL) 


Therefore 


or, if 


then 
fo (tn + app) + NF (tp + apr) < fo (Xx) + NF (zz) 
— ae || Dp |I’. (0.10) 
Thus if 
O<a 


i 


a =min (1 ae a 
_ ’ F (ea) + KM pall? (N44) LE)? 
then the inequality (5.15) holds. 
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But this means that inequality (5.5) is satisfied after a finite num- 
ber of trials of a = 2-', i=O, 1, ..., and with the inequality 


an >> Op. (5.16) 


We shall prove now the following theorem on the convergence of the 
process. 

Theorem 9.1. If the assumptions of the subsection on p. 189 hold, 
then the process has the following properties: 

(a) F (z,) +0 as k +o; 

(b) at any limit point ry of sequence {z,}, k =O, 1, ..., in- 
equalities (5.2) are satisfied and also the conditions for the minimum of 
fy (x) with constraints (9.2). 

Remark. The fact that F(z,) tends to zero means that sequence 
{z,} satisfies constraints (5.2) more and more precisely. 

Proof. All points of {z,} belong to Q, since function f,(z) + 
+ NF (z), by (5.15), decreases from step to step. Further, as Qy 
is a compact set, f, (z) + NF (zx) is bounded in this set since the 
function is continuous. Hence 


Gr |l Pr |? +0 (9.17) 
as k —-oo for otherwise f, (x) + NF (xz) decreases without limit 
along sequence {zx}. 

We shall prove now that p, —O0. Indeed, if p, does not tend to 
zero, then it follows from (5.17) that a, —-0O along a certain sub- 
sequence of indices &. But it follows from (5.16) and the expression 
of a, that then for great k 
Ao 8 
2 F (zp)+ ll pr il * 

Therefore, the right-hand part of the last inequality must tend to 
zero. As F (x) is a continuous function in the compact set 2, F (z) 
has an upper bound and the expression FenptKinl can tend 


to zero only if || p, || + oo. But from (5.6) we obtain that 
IP (za) =I fo(ze) + Dd) ut (zn) fi (ta) [SK (N +14). 
1E FJ 3 (Xp) 


4 —_— 
Op a> Op = 


Thus we have come to a contradiction through the assumption that 
P, does not tend to zero. 
By definition of p,, the relations 


(fi (Zn), Da) + fi (7x) SO, t E Toe (Ze) 
hold. Therefore 


fi (tr)< —(fi (rn), Pe) K\l Dall, t€ Fs (ze). 
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But fj (€r)<fi (Zn), FE To (Ta), 1E Ss (Lx). Hence 
F (Xp) ma di (Tn) <K || Pr Il. 


Consequently, F (x,) +0 as k — oo for F (z,) > 0. Further, let us 
take u* (x) = 0,7 € J, (x). Then we can rewrite (5.6) along sequence 
{z,} in the following form: 


fo (x) + Pr +24 u* (xz) fi (tr) =, 
u* (xp) ((fi (Te), Pr) + fi (@n)) =0, iT. (9.18) 


Let now z, be a limit point of {z,}. As z, € Qy and &,y is compact, 
there are always such points. Without loss of generality, we may 
take that x, —z,. Besides, since wu’ (x) >0O, i € J and their sum 
is limited, we can take that wu’ (z,) ~u' as k > oo. 

Taking the limit in (5.18) we obtain: 


fo ea) + 24 wifi (x4) = 0. u'fi(z,)=0, ic. 


Besides, u' > 0 since uw‘ (x,) > O and point zy satisfies all the con- 
straints (0.2) since f; (x,) < F (x,) and F (z,) > 0. Hence taking 
the limit, we obtain f; (z,) < 0. Thus we have ascertained that the 
necessary conditions for a minimum are fulfilled at point z,. The 
theorem is proved. 

Corollary. If the only point at which the necessary conditions for 
a minimum are fulfilled is the minimum point, then the sequence 
generated by the algorithm converges to the minimum point of 
fy (x) with constraints (5.2). 

Indeed, in this case by theorem 09.1, the only limit point of se- 
quence {z;,} can be but the minimum point. 


Computational Aspects 


The basic operation which requires considerable computations 
at every step in implementing the algorithm is the solving of prob- 
lem (5.4). This is a problem of quadratic programming. In choos- 
ing the method for solving this problem it is necessary to take into 
account that problem (5.4) must be solved after a finite number of 
steps, since it is a subsidiary problem. Besides, since constant N 
is not known beforehand, it is expedient to obtain the corresponding 
Lagrange multipliers uw‘ (z) in solving problem (5.4) in order to 
check whether the choice of NV was right or not. On these grounds it 
seems expedient to pass to the dual problem and to solve it by the 
method of conjugate gradients which was discussed in the subsection 


on p. 160. 
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We shall construct now the dual of problem (5.4). As stated in 
Chap. I, Sec. 3, the objective function of the dual problem has the 
following form: 


9 (u) =min | (fo (2), P) +z ll PIP 
+ > Hi@, pth @|. 6.19) 


1€ 7 6 (x) 


Equating to zero the derivatives with respect to p of the right-hand 
side of the last equality, we find that the minimum is attained with 


p=—f,(e)— » u'fi (2). (5.20) 
7E J 5 (=) 
Thus, point p is uniquely defined by vector u with components 
u', l E J 6 (x). 

Substituting (5.20) into the right-hand side of (5.19), we obtain 


ew=—sz|h@t+ DY “A@l+ D wAw. 6.21) 
i€ J (=) i€ J 5(*) 


Thus, we have calculated the objective function of the dual problem. 
The dual problem consists now in the maximization of g (u) with 
constraints u' > OQ, i € J (2). 

Thus we have come to a problem of the maximization of a quadrat- 
ic form with simple constraints; it is expedient to solve this problem 
by the method of conjugate gradients (the subsection on p.- 160). 

As a result of the solution of the dual problem we obtain Lagrange 
multipliers u* (x) and according to what was stated in Sec. 3, Chap. I, 
the substitution of wu‘ (x) into (5.20) gives vector p (xz), the solution 
of the primal problem. 

Another problem is the choosing of constants NV and 6. Speaking 
generally, the quantity V is unknown. Choosing it too great can be 
a disadvantage, since by formula (5.14), this can involve a consid- 
erable reduction of the step. Therefore, it is expedient to estimate N 
during the implementation of the algorithm. For example, if at a 
certain step it occurred that 


N< > u* (Zr), 


1€ Fg (~p) 
then N should be changed to 
N=2 ) ui (ay). (5.22) 
1€.J5(*p) 


Experience shows that such a correction brings success. Besides, it 
is clear from theoretical reasons that if zx, is sufficiently close to the 
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limit point, then in the regular case wu’ (x,) proves close to Lagrange 
multipliers at point z, which is the solution of the problem and 
therefore formula (5.22) leads to success. The behaviour of factors 
u* (z,) will be considered at greater length below. 

As to quantity 6, it should be reduced if subsidiary problem (5.4) 
proves insolvable at a certain step. 

We shall now describe the conditions under which there exist con- 
stants N and 6. In fact, they exist"in a considerably broader class 
of problems. 

Theorem 5.2. Let all functions f, (x), f; (xz), i€ J be convex and 


there be a point x such that 
fi(z) <0, i€ J. 


Besides, let f, (x) tend to + co as tr --+ oo and let point Ly Satisfy 
constraints (5.2). Then with any 6 > 0, multipliers u* (x), i € Js (z) 
have an upper bound on set Qy with sufficiently great N and if Qy 


ls compact. 
Proof. Recall that 


Qy = {x: fo (z) + NF (x) < fo (Zo) + NF (x9) }. 


Hence and from the continuity of f, (x) and F (x), it follows that 
Q,, is a closed set. On the other hand, 82, is bounded since by hy- 
pothesis, f, (x) —-+ o as x ~+-++ oo and therefore 


fo (x) + NF (x) > fo (Xo) + NF (2) 


with all x sufficiently great in norm. Further, since zy satisfies (5.2), 
then F (x,) = 0. Therefore with all N we have Qy CQp,). Indeed, 
it follows from z € 2, that 


fo (4) < fo (2) + NF (2) S fo (40) + NF (20) = fo (Xo): 


i.e. fy (x) < fy (Xp). Obviously, set 2, is also compact by the assump- 
tions of the theorem. 
Further, since all f; (z) are convex we have 


fi(z) + (fi(z), e— 2) <f; (@) <0. (9.23) 
Therefore, the system of constraints of problem (9.4) is consistent 


with any 6 > 0, as vector p = z — z satisfies it. 
Let now u‘(z) be Lagrange multipliers of problem (5.4). Then 
by Kuhn-Tucker’s theorem 


5 lp (2) 2+ (fe (2), P(@) <> ll PIP-+ (fe (2): P) 
+ D ube) (i), p) +f) 


7€,Y a(x) 
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for all p. In particular, with p=p=z—rz, by (5.23), we have 
> Il P (2) P+ (fo (2), P (2) 
<SIPIF+( (2), + SY ula) (i (2), +h (2) 
1€ Fg) 
<SIPIF+GK@). D+ DY wap 
By FY C9) 

<SIIPIP+ (fo (2), BD) +u! (2) fi @). 

Hence (f; (z) < 0!) 


4 
sle@er+e@, ee) ]—[Firr+aeo, »] 
fi (x) 
where the numerator on the right-hand side of (5.24) contains a non- 


positive quantity since p (xz) is the solution of problem (5.4) and 


p satisfies the constraints of (5.4). 

We shall show now that the right-hand side of (5.24) is bounded 
in 9. Indeed, since functions /; (z) are continuously differentiable, 
the quantity 


SIIPIF-+( (2), P) =F llZ—2IF+ (fF (2), Z—2) 


(5.24) 


u’ (z)< 


is bounded in compact region 92). Therefore, the smaller quantity 
+ IP (2) IF+ (f(z), P(2)) 
has an upper bound. As to its lower bound we have 
+ ll p (2) P+ (f5 (2), P (2) >> ll p (@) IFN F3 (2) WIP () I 
> —11f (2) IP, 


i.e. with z € Q, the quantity under consideration also has a lower 
bound. 

Thus we have shown that in Q, the right-hand sides of (5.24) have 
upper bounds, i.e. u' (x) < M, x € Q,. The statement of the theo- 
rem directly follows from this fact. 

Thus if the primal problem was a problem of convex program- 
ming, then any 5 > 0 suits the algorithm, provided the admissible 
region contains an interior point. 
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Some Generalizations 


At the beginning of this section it was stated that if there are 
equality constraints, i.e. if the constraints are of the form (5.1), 
then the problem is reduced to the form (5.2) by substituting two 
inequalities for each equality. 

Thus, the algorithm can be applied to the general problem (5.1) 
too. It should only be taken into account that if with a certain zr 
we have 


f; (xz) > F (x) — 6 and —f; (x) > F (x) — 4 
where i € Jy, then the system (5.4) comprises two inequalities 
(fi (x), p) + fi (zt) <9, — (fi (2), P) —fi (@) SO (9.20) 
which are equivalent to one equality 
(fi (2), p) + fi (z) = 0. (5.26) 


Therefore it is expedient to substitute in (5.4) one equality (5.26) 
for each pair of inequalities of the type (5.25). In passing to the dual 
problem this will lead to the corresponding multiplier wu’ having 
an arbitrary sign which however does not impede the possibility of 
app ying the algorithm of conjugate gradients (the subsection on 
p. 160). 

Suppose now that in the primal problem in addition to constraints 
(5.2), there is a constraint imposed by the condition that point z 
belongs to a set X of simple structure. In this case it is expedient that 
the approximations obtained should lie in set X. We shall describe 
now how the algorithm is to be modified in this case. As we did pre- 
viously we shall consider, without loss of generality, only the case 
with inequality constraints. 

Thus let it be required to minimize f, (xz), x € E” with constraints 


f(x) <0, i€5, «EX (5.27) 


where J is a finite set of indices and X is a convex closed set. It 
is assumed that there is an index i such that f; (x) = 0. 

Suppose that there are constants N > 0 and 6 > O such that the 
following conditions are fulfilled: 

(a) set 

Qy = ~ fo (z) + NF (x) Co, EEX}, 
Co = fo (Zo) + NF (2p), 

is bounded and the initial approximation x, belongs to X; 

(b) the gradients of functions f; (z), i€ {0} U J in Qy satisfy 
Lipschitz’ condition, i.e. 


Il fi (at) — fi (#2) I] 2 |] zy — Ze II; 
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(c) the problem 
min (f; (2), p)-+ + II PIP 
(fi (x), P)+fi(4)<0, t€Ss(z), c+pEX, (9.28) 


is solvable for p with any z € Qy and there are Lagrange multipliers 
u' (x), t€ Jg (x) such that 


Dd) u(z)<N. 
1€ FJ g(*) 


Remark. Recall that Lagrange multipliers for problem (5.28) are 
all nonnegative numbers such that the following conditions are 
satisfied: 


(f(z), P(2))+ (p (2), P(z)) te Phe u’ (x) [(fi(2), P(2)) + fi (2)) 


6 


< (fo (4). P) + (Pp (2); P+ AA vu (x) (fi (2), P)+fi(Z)) (9.29) 
Fy x 


for all p such that 
| z+ pe€xX. (5.30) 
Besides 


u* (x) [(fi (xz), p(x) +f @l=0, FE Fs (xz). (5.34) 


Thus condition (c) implies that not only the subsidiary problem (5.28) 
is solvable, but also that the minimum point p = p (zx) satisfies the 
necessary and sufficient conditions required by Kuhn-Tuckers’ theo- 
rem. 

The algorithm for solving problem (5.27) is constructed now as 
it was expounded in the subsection on p. 190. Only we take now as 
Pr vector p (z;) which is the solution of the new subsidiary problem 

0.28 


We shall show that the algorithm is convergent, i.e. that the con- 
clusions of theorem 5.1 hold and also that z, € X with all k. It follows 
from the last assertion that any limit point of sequence {z,} lies in 
X. Since the proof of convergence differs from the proof of theorem 
o.1 only in some details, there is no need in giving this proof com- 
pletely. We shall point out only the main specific details. 

First, since z, + p, € X and X is convex, we have x, + ap, € X 
with all a lying between 0 and 1. Therefore if z, € X, then z,4, € X 


too. And since z) € X, by assumption, the whole sequence {z,};~0 
lies in X. Secondly from (5.29)-(5.31) with p = O we obtain that 


(f5(z), P(z))+\|p(z)|F< Dd) u'(z) fi (2), 
1€ 7 3(*) 
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1.e. 
(f(z), p(x))< Dd) u* (x) fi (2) — II p (2) IP. (5.32) 
1€ 7 3 (=) 


This inequality should be substituted for expression (5.7) which was 
used in obtaining estimate (5.13). All other calculations made in 
obtaining the estimates remain unchanged. 

Finally, if at point z, we have p (z,) = 0, then it follows from 
(0.29)-(5.31) that conditions 


(fo(t.), P) +. Dd) w(x) (f(z), P) SO, 
1E 7 3(~) 
r,+pEX, ui(x,)filt,)=—0, i@Io(zy) (5.33) 
are fulfilled. 
Besides, in this case it follows from (5.28) that 


fi (z,) <0, i€ Ss (z,), x,EX, 
and it is also obvious that 
fi (z,) <0, i€ Js (z,)- 


Thus point z, satisfies all constraints (5.27) and (5.33) show that 
at this point the necessary conditions for an extremum are fulfilled. 

So we have shown, as we did above, that if p (z,) =O, then at 
point z, the necessary conditions for an extremum are satisfied. It 
is easy to show that the converse also holds, i.e. that the condition 
p (x) = O is necessary and sufficient for expecting point x to be an 
extremum point. 

The proof of every limit point x, of sequence {z,},k =0O,1,... 
satisfying the necessary conditions for an extremum is performed 
just in the same way as was used in proving theorem 5.1, i.e. by 
taking the limit in passing from relations (5.29)-(5.31) satisfied at 
points z, to relations (5.33) satisfied at the limit point. 


Problem of Linear Progsamming 


Let now all functions f, (xz), f; (xz), i € J in problem (5.2) be linear. 
We have then the problem of linear programming. Though the al- 
gorithm described is mostly important for the nonlinear case, its 
application to the problem of linear programming is also of avail. 
In particular, if set .4 comprises a great number of indices, then the 
problem of linear programming is one with many constraints. At 
the same time, with a small 6 the subsidiary problem (5.4) has but 
a small number of constraints so that the general problem is reduced 
to the solving of a series of simpler problems. Besides as distinct 
from the simplex method, the method proposed does not accumulate 
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computation errors as it does not transform the original matrix of 
constraints from step to step. 

For the problem of linear programming the conditions (a) and (c) 
(condition (b) is satisfied automatically) of the basic assumption are 
too strict for the convergency of the algorithm. We shall not dwell 
on the conditions of convergency for the problem of linear program- 
ming since our‘main purpose is to obtain an algorithm for the nonlinear 
case. It will be shown below that if the assumptions (a) and (c) for the 
problem of linear programming hold, then the algorithm converges 
after a finite number of steps. This fact characterizes to some extent 
the rate of convergence of the algorithm. 

Theorem 9.3. Let assumptions (a), (c) of the subsection (p. 189) hold 
and all functions f, (x), f; (x) which define problem (5.2) have the form 


fi (x) = (a;, x) — 0;. 


Then the algorithm of the subsection on p. 190 converges after a finite 
number of steps. 

Proof. Note from the beginning that in the case under considera- 
tion step a, is equal to unity for sufficiently great k. Indeed, since 
all f; (x) are linear, Lipschitz’ constant L is zero. It follows therefore 


from the formula for a, on p. 190 that 


Gy —min (1, 7, te) 
" > F (za) +X | pall? (NFA)L 
—_—— " ee 
=min(4,5657etmr): 6:34 
But it was proved above that F (x) —0O, || pz || ~ 0. Therefore 
for sufficiently great k we have Fatkini >1 and a, = 1. 


But the construction of a, is such that inequality (5.15) is fulfilled 


with @ = a. Since at each iteration the choice of a, begins with 
halving a = 1, it follows that inequality (5.5) which determines 
the choice of a, will be satisfied immediately without any addition- 
al halvings, and step a, will be just equal to 1. 

Let point z, be now a limit point of sequence {z,} generated by 
the algorithm. As we already know, this point is the solution of 
problem (5.2) for it satisfies all the constraints of the problem and, 
by theorem 5.1, the necessary conditions for a minimum as well; 
these conditions in our problem of linear programming are also 
sufficient. 

We set 


Yo (X_) ={iC Ts fi (eq) = 0}. : (5.35) 
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Then f/f; (xyz) <0 for i € Jy (xg) so that 
Ej = max fj (t,) <0. (5.36) 


i€S 0(xe) 


To simplify the notations in what follows we shall assume, without 
loss of generality, that all the sequence {z,} converges to zy. 
We consider now the subsidiary problem (5.4) at points of {zx}: 


min (f, (tx), P) +> IPI, 
(fi(Zn), P)+fi(tr)SO, t€ Fo (2). (9.37) 
Its solution; is py = p (zx). We denote the corresponding La- 
gtange’s multipliers by ug: k € Js (z,) so that 
wit U(fi (te), Pr) + fi (en) = 0. (5.38) 


Let us show now, that J, (z,) CJ» (zz) for all sufficiently great x. 
Indeed, if i € J, (z;,), then 


fi (Ze) < F (xp) — 6 

and taking the limit with respect to & and since F (z,) —0, we ob- 
tain that f; (cv,) < —6 which would, contradict the fact that i € 
E So (Xx). 

Further, we introduce the notation 

F (xn) ={i€ Sg (xx): un > O}. 
We assert next that for great indices k, 
F (tr) < Fo (2,)- (5.39) 

Indeed, if i € J, (xy), then f; (x,) < &). Since p, —O and fy; (zz) 
are bounded and x, —2,, we have with great k 


E 


| (fi (ta), PIS —Zfi (tr) <2 , 
and therefore 
(fi (ne), Pr) + fi (te) <<. 
Therefore if ut > 0, then 


un [(fi (tn)> Pr) + fi (@x)] <0 


but this contradicts (5.38). 
Remark. In the argument no use was made of the linearity of 
f; (x) and therefore the statements’ that Jo (ty) C Jo (x) and 


J (xp) < Jo (x,) hold in the general case of the nonlinear problem. 
These statements will be used in what follows. 
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As was shown in the subsection on p. 194 the dual of the subsidiary 
problem (5.37) is the maximization of function (5.21) with the con- 
straints ut‘ >0, i € Js (z,). The Lagrange multipliers wu, are the 
solution of the dual problem and the equality of the optimum values 
in the original and dual problems holds, i.e. we have 


(f5 (tn), Pr) +5 Il pall 


Jis(e+ Sy ubfi(en) | + dy wits (en). 
1€ 7 a(*p) t€ Fa (XpR) 


Since p,—0O, the left-hand side of the last expression tends to 
zero and consequently 


fo(tre)+ >, unfi (ZR) + > unfi(t.) 0. (5.40) 
ic J 6p) 1E J g(<p) 

Note now that ui >O0 only if ice J (z,). Besides, 

fi (xz) =(a;, 2) — bi, tTE{OPU SJ, 


so that f;(z) = a, and is independent of x. Therefore, (5.40) can 
be rewritten in the following form: 


ot Suda? 3 win cen—o. 


z1€ (x) ic 44 (xp) 


—~—1t 
—~%2 


_ ft 
2 


4 
2 


But J (2x) < Jo (z,) as shown above and therefore f; (z,) > fi (z,) = 
—(Q, since f;(z,)=0 with i€ J_(z,), by definition. Therefore 


: 2 
f jay + >) Upa; — (0. 


2 
ic /(x,) 
But 
—+ | o+ >» HOF |" 
i€ 4 (xp) 
< max —>| a+ >) wa; | <0. (5.41) 


u'd0, 1€ 7 (xp) iC 7H (xp) 


We introduce the notation 
o (7) = max —|| Ay + » u‘a; |’: 
u'>o, icf icf 


w (+) is a function defined in the set of indices 7, 7 CJ. Since 
Y <J, this function can take only a finite number of values. It 
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follows from (5.41) that 
@) (J (xp)) — 0. 


But this means that w (J (z,)) = O for all sufficiently great & since 
as was just mentioned, w (#7) can take only a finite number of values. 
Thus, for great k 


wo (FY (z,)) = 0. (5.42) 


We now choose k so great that a, = 1; condition (5.42) is ful- 


filled and J (xz) CJ, (tg). AS a, =1, we have 2Xp4+, = rR + 
+ p,. Since z, —Zy, Py 0, we can take that 


fi (te <BZ<0, iF Fy (2,)- (5.43) 


Let us consider again subsidiary problem (5.37). As p, satisfies 
the constraints of (5.37) and f; (x) are linear, we have 


fi (Crti) = (fi (tn), Pr) + fi (Za) < O (9.44) 


for i € Ja (x,) and consequently for i € J, (z,) too as Jo (ty) C 
<— Js (x;). We have thus shown that z,4, satisfies all constraints of 
problem (5.2). 

We demonstrate that x4, is really the solution of problem (9.2). 


Indeed, it follows from (5.38) and the definition of set J (z,) that 
fi (tnt1) = 0, i € TF (zn). (5.45) 


But (5.42) means that there are numbers ui > 0, i € 4 (x,)} such 
that 
a+ >) uia;=0. (5.46) 
i€ f(x) 


Setting now ui = 0, i€ J (z_) we obtain that there are numbers 
ui >QO such that conditions 


Ay + » uia; =0, wif (Trai) =O, 
icy 


are fulfilled. 

But the last relations (see Chap. I, Sec. 3) are the necessary and 
sufficient conditions for point z,4, to be the solution of the prob- 
lem of linear programming. 

Thus the algorithm provides for the solution after a finite number 
of steps. Q.E.D. 
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Local Estimate of the Rate of Convergence 


It was shown in the preceding subsection that the algorithm pro- 
posed converges after a finite number of steps in the linear case. 
Here we shall show that in the general nonlinear case the algorithm 
converges at a geometrical rate and with certain favourable circum- 
stances even at a quadratic rate. 

Theorem 5.4. Let x, be the solution of problem 5.2 and the following 
conditions hold: 

(a) For any sufficiently small 56 > 0, the subsidiary problem (95.4) 
is solvable. 

(b) Functions f; (x) are twice continuously differentiable and the 
gradients f; (xg), i € Fo (L_), where 


J 9 (z,) = {i: hi (x,) =0, ic J}, 


are linearly independent. 
(c) At point zy the necessary condition for a minimum is satisfied 
in the form 


fo(z,)+ 2D uifi(z,)=0 


EJ o(Xe) 


and?’ui > 0, i € Jo (Xx). 
(d) The sufficientfcondition for a local minimum, i.e. 
(p, L” (tq, Uo) p) > 0, 
holds for all p 0 which also satisfy the condition 
(Dp, fi (Ze)) = 0, 1 E Fo (x) 


where 


L(x, u)=fo(xz)+ > u'fi (a) 


i€ TS (xx) 


and L” is a matrizjof second derivatives of L (x, u) with respect to x. 
Then there is a neighbourhood Q of point x4, 55 > 0 and a >O0 such that 
the process 

Fet+1 = Ip + ADp (9.47) 


converges to point r, from any initial approximation zx, € Q at a geo- 
metric Tate, i.e. there is a number 0 < q <1 such that || rz — zy || < 
< Cq" for all sufficiently great k. 

Proof. The basic idea of the proof is as follows. As was shown above, 
at point z, the equation 


p (zy) = 0 
is satisfied. 
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Process (5.47) is a simple iterative process for solving the last 
equation. Therefore in order to estimate the rate of convergence we 
can use Ostrowski’s theorem which is formulated below. This theorem 
requires an estimate of the eigenvalues of the matrix of first deriva- 
tives of p (zx) at point z,. Therefore the main problem will be the 
calculation of this matrix and its eigenvalues. 

We shall break the proof of the theorem into several parts. 

We take 


So (tq) = {iE ST: fi (7) =O}, 
= max fi (z,) <0. 


1E e/ o(X») 
Lemma 9.2. Let the conditions of the theorem be fulfilled and let 
0) <—=. Then there is a neighbourhood of point zy such that 


Je (xz) = Fy (xy) and p (x) is continuously differentiable with respect 
to x in this neighbourhood. Moreover, the set 


J (x) ={6E Fo (z): (fi (2), P(z)) +f: (2) =0} 
coincides with set J (ry). 


Proof. Since all functions f; (x) are continuous, there is a neighbour- 
hood of point z, such that 


—2<filzt)< ft, i€Jo(x,), (5.48) 


fi(t)< yp, iF To (ay). (5.49) 
Recall now that 
F (x)= max {0, man fi (x)} 
ic J 


and i € J, (x) if f; (x7) > F (xz) — 6. It follows from (5.48), (5.49) 
that 


0< F (z) <3 (5.50) 
and if i € J» (z,), then 


e. i€J5(xz). On the other hand, { if 1eIole,) ithe: it follows 
from (5.00) that F (x) -8<—2 and consequently 


fi (x) >—2 > F (2)—6, 
i.e. 2 € Sg (2). 
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Thus we have shown that 75 (rz) = J, (xy) in a certain neighbour- 
hood of zy. 

Recall now that if p(x) is the solution of problem (5.4), then con- 
ditions (5.6) hold; these conditions can be rewritten in the equiva- 
lent form: 


P(z)+fo(z)+ Dd u*(z)fi(z)=0, (5.51.1) 
i€S (x) 
(fi(z), p(z))+fi(z)=0, i€ F(x), (5.54.2) 


(fi(z), p(z))+fi(t)<0, i€ F(x), i€ Fg (x) (5.54.3) 


where u' (zx) > 

We introduce “the following notations: 4% CJ, (zx) (= Jo (Zx))3 
fy (z (x) for a matrix whose rows are f; (x), i € y. fy (x) for a column- 
vector with components f/f; (z), i€ 4 and uy for a column-vector 
with components u', i € ¥. Then equations (5.51.1), :(5.51.2) can 


be rewritten as follows: 
P(x) + fy (x) + f¢ (x) wy (x) = 0, 
fy (2) p(x) +fy(z)=0, ¥=SI(z). (5.52) 


The last expressions can be considered to be a linear system of equa- 
tions to be solved for p (x) and uy (zx). It is easy to see that in a cer- 


tain neighbourhood of zy, system (5.92) has only one solution ex- 
pressed by the formulas 


wy (x) = (Fy (2) £8 (@)* Uy (2) — fy (@) K(@)), 
p(x) = —f;(2)— #2 (a) u(2). (5.53) 


It follows from these formulas that if set $ is fixed, then u ¥ (x) and 


p(x) are continuously dependent on xz. 
Let now z, —z,. We shall show that for all great k 


F (zp) — J 9 (Z,)- 


Suppose that our statement is not fulfilled and there are great 
numbers & such that J (z,) is a subset of 7, (zy). Since there can 


be but a finite number of different sets J (x), we can take, without 
loss of generality, that a sequence x, —>z,y is chosen such that 


GI (t,) = F~, F CIApo (Xz). 


207 


CONSTRAINED FUNCTION MINIMIZATION 


Substituting now z, for x in (9.01) and taking the limit (p (z,) > 
—+>p, u'(z,) > u', i € ¥) we obtain that 


P+fo(t4)+ Dd) v'fi(z,) =0, 
icf 


(fi(z,), P)+fi(z,)=0, i€#, 
(fi (zy), P)+fi(z,)<0, i€¥, i€Io(z,) =I (2) 


where u' > 0, for u' (z,) > 0. But these last relations show that ? 


is the solution of the subsidiary problem (5.4) for point zg, i.e. p= 
= p (x,). But point z, is the solution of problem (5.2) and there- 
fore p (zy) = 0. Consequently, 


fo (x4) + » u' fi (z,) =:(), 
ie? 


Using now condition (c) of the theorem we obtain from the last 
expression that 


4 (uy — u*) fi (x4) + > usfi (,) =9 
i€S o(x0\F 
and this contradicts condition (b) of the theorem. Thus, in a certain 
neighbourhood of point z,, set J (xz) coincides with J (zy). From 


4 (x) being constant and formulas (5.53) it follows directly that 
uy (x) and p (x), ¥ = F> (xy) are continuously differentiable with 


respect to x since, by condition (b), f; (x) are twice continuously dif- 
ferentiable. 
Remark. Thus, in a small neighbourhood of zy, p (x) and uy (x) 


is the solution of the system of equations (5.52) with a constant set 
Y = J, (xy). Therefore we shall omit index # in Uy (Zz). 


Lemma 9.3. The matriz p’ (2). of derivatives of vector p(x), i. e. the 
matrix with elements dp‘ (x)/dx’, i, j = 1, ..., m, where p'(zx) is 
the i-th component of vector p (zx), at point xy has the following form: 


p’ (z,) — [P+ ({ — P) L" (z,, Uo)] 
where 
P=f Gi 0(Xe) (2) (7 Sova) (24) f Hotes) (24) f gy o(X#) (Z) 


and Uy = U (Zz). 
Proof. By differentiating the first of formulas (5.52) we obtain 


p’ (x,)= —L"(x,, u)— Dy fi (%_) (uF (z_))* (5.54) 


JEed o(xm) 
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where 


Ou) (r) 
ox} 


(+) — 
° 7) dui (x) 
Ox” 


From formula (5.51.2) by differentiating (p (x,) = 0), we obtain 
fi* (,) BD’ (x) + fi" (z) =9, t€ Fo (2,) (5.55) 


Note now that operator P defined in formulating the lemma is the 
operator of projecting on the subspace spanned by the vectors fj (z,), 
i€ J, (xy). Indeed, this can be seen (see also the subsection on 
p. 147) from easily verified relations: 


(1) PHF cg) (Ca) =F sccgy (Se) OF PHi(e)=fil(%y), TE So (2); 
(2) P*—P, P*=P; 
(3) (I—P)P =0. 
If we now rewrite (5.55) in the form 
FS, ocany Fa) P (Fa) FLY (ayy (Fe) =O» 
then taking into account the expression for P, we obtain 
Pp’ (xy) = —P. (5.96) 


Further, it follows from relation (1) for P that (J — P) fi; (x4) = 0. 
Therefore applying (J — P) to both sides of (5.54), we obtain 


(I — P) p’ (xy) = — Ui — P) L" (xq, ug). (9.07) 


Adding (5.56) and (5.57) we obtain the required formula for p’ (z,). 
Lemma 9.4. Higenvalues y; of matrix p’ (ry) can be characterized 


as\ follows: y; = —1 forj = 1,2, ..., m, wherem < nis the number 
of indices in set Jo (Xu). Py = —Aj_m. J = m4, ..., n, where 
Aj, j=1,..-,m—m, are the eigenvalues of the _ matriz 


(I— P) L" (xy, Up) Z — P) and 43 > 0, jf = 1, ..., mn — Mm. 
Proof. Let o be the eigenvalue and y the eigenvector of matrix 
p (z,). Then, according to lemma 95.3, we have 


—Py — (I — P) L" (zg, up) y = Gy = OPy + oT — P) y. 


We use now the relation P (I — P) = O and by multiplying the 
last equality in turn by P and / — P we obtain: 


—Py = oPy, (9.08) 
—(J — P) L" (a4, Uy) y = o UT — ?P) y. (5.99) 
There are two possible cases. 
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(1) Py #0. Then it follows from (5.58) that o = —1. 
(2) Py = O. In this case, (J — P) y = y and (9.959) can be rewrit- 
ten as follows: 


(J —P) L" (xy, uo) (1 — P) y = —oy, (5.60) 


i.e. o is the eigenvalue of matrix (J — P) L” (I — P). This matrix 
is symmetric since P = P* and L” = (L")* being the matrix of 
second derivatives of function ZL. Moreover, the matrix under con- 
sideration is nonnegative definite. Indeed, for any w we have 


(w, (I — P) L’ I — P)w) = (2, L" 2) 
where 
z= (/ — P)w. 


But fxg) (Le) 2 = Fxg) (Ce) (2 — P)w =O and therefore 
(z, L” z) > 0 by the condition (d) of theorem 5.4, the equality sign 
being possible only ifz = (J — P) w = 0. It follows from the symme- 
try of matrix (J — P) L’ (f — P) that its eigenvalues and eigen- 
vectors are real. As y ~ O and y = Py + (J — P) y, it follows from 
Py = 0 that (J — P) y #0, and therefore from (5.60) we obtain 


—o(y, y=(y, U—P)L’ U— P)y)=y, L’y) > 0. 


Thus, —o=+0 and consequently o = —A;, where A; > 0 is 
the eigenvalue of matrix (J — P) L” (I — P). 

Thus we have proved that the eigenvalues of matrix p’ (z,) are 
real and equal either to —1 or to —A,j, A; > 0. It remains to deter- 
mine only the number of eigenvalues which are equal to —1. 

Due to the fact that 


Phi(z,)=fi (z,), i€ So (z,), 


operator (J — P) has m eigenvectors f; (z,) which correspond to the 
zero eigenvalue. Therefore, matrix (/ — P) L" (J — P) also has m 
zero eigenvalues. On the other hand, as we have seen, matrix p’ (z) 
has all the nm nonzero eigenvalues, each being either equal to —1 or 
the one of matrix (J — P) L" (J — P). Clearly, this is possible only 
if the statement of lemma 5.4 holds true. 

We now complete the proof of theorem 5.4. It follows from Ostrow- 
ski’s theorem that if z, is the solution of the equation p (x) = O and 
the eigenvalues of matrix / + ap’ (z,) have a modulus less than 
unity, then the method of simple iteration x,4, = x, + ap (z,) 
converges from all points of a certain neighbourhood of point z, and 
at the same time the following rate of convergence holds: for 
each e >O there is a number C (ec) such that || z, — 2, ||< 
<C (e) (qg + ©)", where q, is the greatest of the moduli of the 
eigenvalues of matrix J + ap’ (z,). 
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Consider now the eigenvalues of matrix J + ap’ (x,). They are 
equal either to 1 — @ or to 1 — adj. We choose now a so that all 
of the following inequalities be satished: 


1—a>—1, 1—aA;>—1, j=i,...,n—~Mm, 


i.e. that O<a@ < min {2, 2/A,}, where A, = max Aj, j = 1, 

..,; 2% — m. Then all the eigenvalues of matrix T+ ap’ (x) will 
have moduli less than unity; hence, referring also to Ostrowski’s 
results, we have that theorem 9.4 holds. 

Theorem 5.5. Let the conditions of the preceding theorem be satisfied 
and, besides, m (the number of indices in set Jy (xz)) be equal to n (the 
dimension of the space). In this case, process (5.47) converges from a 
certain neighbourhood of point x, witha = 1 ata quadratic rate. 

Proof. It follows from lemma 5.4 for the case under consideration 
that all the eigenvalues of matrix p’ (x,) are equal to —1, and there- 
fore the eigenvalues of matrix J + ap’ (z,) are equal to 1 — a. 
Ifa = 1, then all the eigenvalues are equal to zero and q, = 0. There- 
fore according to Ostrowski’s theorem, we obtain || x, — 7, || < 
< C (eg) &” and this means that the process converges at a higher 
rate than that of any geometric progression. In fact in this case, 
process (5.47) passes into Newton’s method for solving systems of 
equations f; (x) = 0, i € J, (xy) which as well known and as shown 
below in Sec. 6 converges quadratically. 

Remark. All the arguments in this subsection were conducted for 
the case of a problem with only inequality constraints. It is obvious, 
however, that all the results obtained can be applied to the case 
with equality constraints. 


6. LINEARIZATION METHOD: 
SOLVING SYSTEMS OF EQUALITIES AND 
INEQUALITIES AND FINDING THE MINIMAX 


In this section, the linearization method is applied to two prob- 
lems closely connected to the usual problem of mathematical pro- 
gramming. It proves that in this case one can succeed in constructing 
effective algorithms which have a fast rate of convergence. 


Systems of Equalities and Inequalities 


Given two finite sets of indices %- and J°® and functions f; (x), 
zx € k". To find the solution of the following system: 


fi(z)<O0, i€T, fi(z)=0 , t€f®. (6.1) 


Suppose that functions f; (z) have continuous gradients fj (x) and 
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also that the gradients satisfy Lipschitz’ condition with constant L: 
Il fi (a1) — fi (ze) I] < LZ |l ay — Zz II. 


The norm of vectors is everywhere Euclidean. 
We use the notation: 


F (x) = max (max f:(x), max |f;(zx)|), 
ied ie f° 


Js(e)= {i i€T-, fi (x) DSF (zx) — 5}, 
F3(x)={i 1€ 7%, | fi (x) |S F (x) — 8}. 


We choose an initial point z, and assume that for all z that satisfy 
the inequality F (rz) < F (z,), the gradients f; (xz) are limited in 
norm by constant K. 

Basic assumption. There are numbers 6 > 0 and C > 0 such that 
for all x for which F (zx) >0, F (xz) < F (z,) the following system 
is solvable for p: 


(fi(z), P)+fi(z)<0, i€ 55(a), 
(fi(z), P)tfi(z)=0, t€ S§(z). (6.2) 


Let p (x) be the solution of (6.2) that has the minimum norm. Then 
for x such that F (zx) > 0, 


ll p (z) || < CF (2). (6.3) 


The inequality (6.3) characterizes to a certain extent the regular 
solvability of system (6.2). In particular, if system (6.2) is trans- 
formed into a system of n equations in m unknowns, condition (6.3) 
is equivalent to the assumption that the matrix of the correspond- 
ing system is nonsingular. As will be shown further on, (6.3) holds 
if the gradients fj (x), i € 45 (x) U 43 (x) are linearly independent 
for all xz, F (x) > 0. 

We turn now to the construction of the algorithm. The successive 
approximations are constructed by the formula 


Thty = Te + ODay Pr = P (rR) (6.4) 
where parameter a, is chosen by sequentially halving unity until 
the following inequality is satisfied: 

F (xp + OGrapr) < (1 — earn) F (zp) (6.5) 
where € is any number, chosen from the beginning, 0 <e < 1. Clear- 


ly, formula (6.4) is applicable if F (x) > 0. Otherwise, the process 
stops and z, is the solution of (6.1). 
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Convergence of the Algorithm 


The implementing of the algorithm proposed is characterized by 
the following theorem. 

Theorem 6.1. Let all the assumptions of the preceding subsection 
be fulfilled. Then sequence {z,},k = 0, 1,..., generated by the algo- 


rithm according to formula (6.4) converges to x, the solution of system 
(6.1), and at the same time 

(a) for a sufficiently great k, a, =1; 

(b) for a sufficiently great k, 

F (2p41)<LC*F* (zy); 
(c) for any gq, 0 <q <1 there is a number k (q) such that 
_ gz kD 
lz—all<teqap (6.6) 


for all k>k (q). 

Proof. Obviously, if F (z,)<0 at a certain step, all the statements 
are proved. Therefore we suppose that F (x,) > 0 for all k. 

First of all we show that the choice of a, under condition (6.5) is 
always feasible. For 'i € J§ (x,), using Taylor’s formula, we have: 


fi(Zn+ ADR) =fi (Tr) +0 (fi (Tn-+4-O:@pxn), Pr) 
=fi (Xr) +a (fi (Zr), Pr) +o (fi (22+ 9;apn)—f (7x), Pr) 
where 0 < 0;<1. But since p, satisfies (6.2), we have 


(fi (te), Pr) S — fa (Ze). 
Further 


(fi (Zn +O: Pr) — fi (Zn), Pr)<|I Pr || || fi (Tn + 01a Dn) — fi (zr) || 
<I Pr ||] ier || LoL || ps II. 
Therefore using (6.3), we obtain 
fi (Zr + @Pr)<fi (Tx) —@fi (Zn) + @7L || Dp |? 
<(1—a) F (z,)-+ a?LC?F* (xz,). (6.7) 
For i€ I~, t€ Ja(rn), fi (tn) << F (zx) —8 and, therefore 
fa (Tr +&Dp) = fa (Ze) + (fi (7, -+O;:n), Dr) 
<F (x,)—85-+ aK || pp || F (2x) —8 + 0KCF (zy). (6.8) 
Quite similarly, we have for i€ 73 (zz) 
| fi (Te + Opn) |X (1 —@) F (zy) + a? LC2F? (zp) (6.9) 
and for i € 98 (2) 
lfi (tn + app) |< F (7x) —8 + a@KCF (z;). (6.10) 
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Note now that 
(1 — a) F (x,y) 2 F (xx) — 6 + @CKF (xy) 
if a < ak, where 
of => 
(1+ CK) F (zp) 
Therefore for a < a; it follows from (6.7)-(6.10) that 


F (x, + apr) < (1 — a) F (xy) + a? LC? F? (2,) 
or 


F (x, + app) < F (x,) — @F (2,) [1 —aLC?F (z,)). (6.44) 
If a < ai, where 


o? — 1i—e 
2 “~ LC2F (zp)? 


then 1 — aLC*F (x,) > &« and therefore (6.11) can be rewritten as 
F (xy + @pp) S F (zy) — wel’ (zy), 

a< min {a}, aj}. (6.12) 

It is now clear that if we proceed to reduce a beginning witha = 1, 


then inequality (6.12) will hold after a finite number of trials and 
the a, chosen will satisfy the inequality 


Q, >>min {4, + oh, 5 a \. (6.13) 


Thus, we have proved that the choice of a, under condition (6.5) 
is feasible and that this choice can be realized after a finite number 
of operations. 

We show that F (z,) —0O. Indeed, it follows from (6.5) that F (z,) 
decreases monotonically. Therefore, it can be concluded from the 
formulas for a} and a that these quantities increase with increasing 
k. Consequently, formula (6.13) permits us to conclude that a, > 
= @>0 and so 


F (rpi1) < (1 — eax) F (zx) < (4 — ea)F (zp). 


Therefore F (z,) < (1 — ea)* F (z,), hence F (z,) +0. But then 
ak + oo, af +—-+ oo as can be seen directly from the formulas 
for these quantities. Therefore, (6.13) permits us to conclude that 
a, = 1 for a sufficiently great k. But (6.11) shows for all such &, if 
a=1 is substituted into it, that 

F (241) < LC*F? (z,). (6.14) 


Thus statements (a) and (b) of the theorem have been proved. 
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We can now assert that there is a k, such that a, = 1 for k > ko 
and (6.14) is satisfied. Therefore, by (6.3), 


ll Zeta — Zr Il = Il Da |] < CH (zp). 


We set v, = LC*F (x,). Then v, —O and (by (6.14)) vga, < vz. Let 
gq be such that 0 <q <1. Then there is a k (gq) such that p, <q 
for k >k(q). Therefore vz4, < qu,, k 2k (q). Hence 
h-h _ 
Va<vig SO *?,  Um<Q™ yn, = m>k>k(q). 
This permits us to obtain the following estimate: 


m—1 F m—1 
l2m—2ell< >) Weis —2y || <> re 7 
j=kh j=k 
m—1-k 9k-Riq@) 
<4 > ¢<x > ss .CCO6.) 
= TC 7S Te(i—q = LC6—o° 
j=0 


It follows from this estimate (according to the well known Cauchy 
oniterion) that sequence {z,} converges to a certain point zx. Since 


F (x,) 0, we have F (x) = 0, i.e. x is the solution of system (6.1). 
Moreover, taking the limit in (6. 15) as m — oo we obtain 


gz 


|z—z, ISteq=a 


Q.E.D. 


Remarks 

Remark 1. Let us be solving a system of n equations f; (x) = 0, 
i=1,..., nm, where x € E”. Then 

f; (xz) > F (x) — 4, i=i1,...,n, 
F (x)= max | f:(2)| 
i<i<n 
for any 5, provided z is sufficiently close to the solution z. Therefore, 
J8 (xz) = (1, 2, ..., n} and system (6.2) takes the form 
fi (z), p) rhe) =0, i=1, -.e, M. (6.16) 

Therefore the method proposed coincides with Newton’s method 
in which iterations are performed by the formula 2,4, = xz, + 


+ p (x;,), where p (z) is the solution of system (6.16). The condition 
for the convergence of Newton’s method is the nonsingularity at 


point x of matrix f’ (x), where f’ (xz) is an m X n matrix whose rows 
are f;(z). In this case, p (x) = —(f’ (x))" f(z), where f(z) is a 
column-vector whose components are f; (x). But it follows from the 
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last formula that 
Il P(x) I< UG (2))7* WAN F(z) WS Co WCE (2))7* IF (2) 


where C, is a constant. It can be seen from this inequality that (6.3) 


holds in a certain neighbourhood of point z. 

Thus it follows from the theorem proved that the usual Newton's 
method is locally convergent in solving a system of n equations with 
m unknowns. 

Remark 2. If only one equation f (x) = O in nm unknowns is to be 
solved, then system (6.2) takes the form 


(f’(x), p) + f(z) =0 (6.17) 
and it is required to find the solution of this equation with a mini- 
mum norm, i.e. to find the minimum of || p ||? with constraints (6.17). 
Using the rule of Lagrange multipliers, we have in this case 


P(2)= —Tprayet (@). 
hence 
IP @)ll=>p@_ ll @l: 


Clearly, formula (6.3) will be satisfied if || f° (x) || 2 y for all z. 

Remark 3. The finding of vector p (x) at each step involves 
the solving of the problem of minimization of || p ||? with constraints 
(6.2). This is a problem of quadratic programming. Concerning the 
methods of solving it, we can use the same information given in 
Sec. 5 about the solving of the subsidiary problem of quadratic pro- 
gramming which arises in the linearization method. 


Sufficient Conditions of Convergence 


The main condition (6.3) which guarantees the convergence of 
the algorithm is not easy to check. This subsection describes con- 
ditions that can be checked more effectively. In particular, for the 
convex case if there is an interior point in the domain defined by 
expressions (6.1), the conditions guarantee the convergence of the 
algorithm. 

Let the system contain only inequality constraints, i.e. 


fi(z) <0, i€ TJ. (6.18) 
Then the subsidiary system (6.2) takes the form 
(fi (zt), P) Thi (7) <9, iE SG (z). (6.19) 
Clearly, this system can be solved with F (x) > 0 if the system 
(fi (rz), p) + F(z) <0, 1 € JE (2) (6.20) 


is solvable. 
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Lemma 6.1. Jf F (z) > 0, then system (6.20) has a solution if and 
only if 


Ly(z)= min || > Afi(z)|]>0 
Ai20  G4€ J5(x) 


where the minimum is taken over all };>0 such that 


>» w=1. 


igJS§ (x) 


Then the solution p (x) of system (6.20) with a minimum norm satisfies 
the equality 


IP (2) ||=q5 F 


Proof. Let A; =O be such that their sum over all i € J§ (2) is 
equal to unity. If p is a solution of (6.20), then 


— 2 A(fi(x), p)>F (2), 
i€ S 5 (x) 
or 
(— >) Aifi(x), p)>F (2). 
1€T B(x) 
Using the inequality (z, y)<||z|||ly|], we obtain 
| afi (2) IN Pree). 
i€e/ 6 (x) 


ICS §(x 


But the last inequality holds with any A, chosen as mentioned above 


and therefore 
Le (x) || p || & F (2), 
i.e. Leg (x) >O and 
F 
I PIl> (6.21) 


Thus, it has been proved that the conditions of the lemma are 
necessary. 

Suppose now that Ly (x) > 0 

Consider the problem: to find the minimum of p with the follow- 
ing constraints: 


(fi (z), p) + F(z)—p <0, i€ J (2), 
F (z 
| Plo: r= Tor > 0. (6.22) 


This is a problem of convex programming, and all the conditions 
of the Kuhn-Tucker theorem, in particular Slater’s condition," are 
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obviously fulfilled. Let po, p, be the solution. Applying the Kuhn- 
Tucker theorem, we obtain that there are A; > O such that for all 
P, || p || <r, and for all p the following inequality holds: 


Pot Dy A((fi(Z), Po) +F (z)— Po) 


iE ed § (x) 


<p+ 2 M((fi(z), P)+F(2)—p). (6.23) 


i¢ T(x) 
Besides, we have 
Ai (fi (2), Po) + # (%)— Po) = 9, FE Se (xz). = (6.24) 
Since p is arbitrary, it follows directly from (6.23) that 


S apt. 


ied §(x) 
Because of (6.24), we can rewrite (6.23) in the form 
Po<( DX Aifi(z), p)+F (2). 
iS G(x) 


Taking the minimum with respect to p, || p || < ro, of the right-hand 
side of the last inequality we obtain 


P< —roll > Afi (z)|| + F (z)< —roLs (2) + F (x) =0. 
165 (x) 


Thus, Pp <O, i.e. vector pp satisfies the system of inequalities 
(see (6.22)) 


(fi (2), Po) + F (zt) Po SO. E SG (2), 


and we have also || pp || < ro = F (z)/L, (z).| But it follows from 
(6.21) that 


F 
| Po | > ; 
Therefore 
F 
|| !Po || =7a 


and vector pp is the solution of system (6.20). Moreover, (6.21) 
shows that this is a solution with a minimum norm. Thus, p, = 
= p (x) and the lemma is proved. 

Theorem 6.2. Let all the assumptions of the subsection on p. 211 
hold, except the main one. Moreover, let Lg (x)j> y > O for all x such 
that O < F (x) < F (ax,). Then the conditions of the main assumption 


are fulfilled too and all the results of theorem 6.1 hold for problem 
(6.18). 
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Proof. Since any solution of system (6.20) is also the solution of 
system (6.19), we have 


Il Pp (z) I< Il p (2) II. 


Therefore by lemma 6.1, we have 
F 
IP(@II<z bo<; ~ F(z); 


this shows that all the conditions of theorem 6.1 are satisfied. 

Note that the condition ZL, (x) > y > 0 is natural enough, for it 
requires linear independence of vectors fj (x), i € J (zx). 

Theorem 6.3. Let functions f; (x) in problem (6.18) be convex and 
continuously differentiable. Besides, let the domain defined by the in- 
equality F (x) < F (x5) be compact, the gradients f; (x) in this domain 
satisfy Lipschitz’ condition and there be a point x such that F (x) = y <0. 
Then with 6 < —y all the conditions of theorem 6.1 are fulfilled. 

Proof. As f; (xz) are convex, we have 


fi(@) Shi(@)+(i@, 2-2) ieT. 
For i € 9% (x) with p = x — z we have 
fi (z) +8 Sf, (x) +84 (fi (2), p)- 


But f; (zt) +6 < F(z) +6=y+6<Oand f,; (cz) +6 >F (2), 
i€ J5 (x). Therefore 


OS>ytOSF (x) + (fi (z), Pp), +t € J (2). 
Setting y + 6 = —s, we obtain that 
(fi (x), p) + (F (zt) + &)<0, ¢ € JG (2). 


But by lemma 6.1, this means that for all z such that F (x) < F (z,), 


F (x) + ¢ > 0, system (6.20) is solvable and for such z also Lg (x) > 
> 0. Now since the domain F (x) < F (x) is compact and the 
functions are continuous, it can be easily ascertained that Lg (rz) > 
= vy > 0 for all x such that 0 < F (x) < F (>). 

Thus all the conditions of theorem 6.2 are satisfied and this com- 
pletes the proof of theorem 6.3. 


Solving the Problem of Finding the Minimax 
Given functions f, (x), i = 1, ..., m. We compose the function 


F(2)= max fi (2). (6.25) 


The problem is now to find point z € E” which minimizes F (z). 
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It is easy to see that this problem can be reduced to the following 
one by introducing an additional variable <x"*!: to minimize 
fo (xz, x™t1) = x"*! with constraints 


f(z) —2™0'=<=0, i=1,..., m. 


Therefore the methods described above, in particular the lineariza- 
tion method, are now applicable. Note also that in this way we can 
solve also the problem of the minimization of F (z) if x varies in a 
certain domain Q defined by a system of equalities or inequalities. 

In this subsection, we shall discuss the method of minimization 
of F (x) with z € E". This method is based on a slight modification 
of the linearization method. 

Let us introduce at each point z the following subsidiary problem: 


. 1 
min (B+ IP IP), 

(fi (xz), pP) + fi (2) -—B<O0, it € Jo (2), (6.26) 

where 6 > QO and 
Is(t) = {i 1Loism, fi (t) BF (x) — 4}. 

Note that problem (6.26) is a problem of convex programming for 
which Slater’s condition is satisfied, for taking 8 sufficiently great 
we can always satisly strictly constraints (6.26). By applying directly 
the Kuhn-Tucker theorem in its differential form, we now find that 


p (x) and B (z) is the solution of problem (6.26) if and only if there 
are u' > OQ, i€ Jag (x) such that 


» w=4, 
i1€F 5(x) 
p(z)+ 2 u'fi(z)=0, 
t€eJ 6(X) 
u* ((fi (x), p (x)) +f (x) —f (x) =0, iE Js (x). (6.27) 
Further, point p = 0, B = F (2), obviously satisfies constraints 
(6.26). Therefore 


4 
B (2) + > || p(2) |P<F (2). (6.28) 
We formulate now the algorithm for solving the problem. Let z, 
be a certain initial approximation. Let points z;,j = 0, 1, ...,k 
be already constructed. Then 
LR+y = Ip T+ Apr (6.29) 


where Pp, = p (x,). We choose a, equal to 2-‘ where i, is the first 
of indices i = 0, 1, ..., which satisfies the following inequality: 


{ 
F (t_ + 2“pr)<F (zn) —2'e|l pa |P,  o<me<t- 


220 


EQUALITY AND INEQUALITY SYSTEMS 


Thus the condition 
F (Xp41) < F (rn) — ape || pr | (6.30) 
is fulfilled. We formulate now the conditions for convergence of the 
algorithm. 

Lemma 6.2. p (x) = 0 if and only if the necessary conditions for a 
minimum of F (x) are satisfied at point x. 

To prove the lemma we must recall the necessary conditions for 
a minimum of F (zx) and use an argument analogous to that used for 
proving lemma 0.1. 

Theorem 6.4. Let f; (x) be continuously differentiable, domain 
Q = {x: F (x) < F (2)} be bounded and fj (x) satisfy in Q Lipschitz’ 
condition with constant L. Then any limit point x, of sequence {zx}, 
k=0Q0,1,..., satisfies the necessary conditions for a minimum of 
F (x) with x € E”. If f; (x) are convex, then x, is the solution of the. 
problem. 

Proof. As in proving theorem 5.1, it is easy to obtain the follow- 
ing estimates: 

fi (tn + Dp) <S fii (tn) + & (fi (Zn) Pr) + @7L |I pa Il, 
i € Ss (Xr), 
fi (Zp + apr) SF (Ly) —8 + aK || pr ll, FE So (ze) 


where K = max || fi (z) |]. 
If we use now the condition (see (6.26)) 
(fi (Zr), Pr) S Be — fi (Tr), Br = B (Ze) 
and also (6.28), then the first estimate takes the following form: 
fi (Xp t- Opp) < (1 —@) fi (Zn) + Beat 7D || Dr ||? 
<F (&_) —@ (F (Ln) — Bu) + @7L | pr |l? 


OL 


<F (tn) —> || Pr ||? + PL] pr IP 
6 


pall (K+ 5 lea) 


F (2) —> || Pr |? > F (zn) —6 + @K || Pr ||, 


Further, since forO<ax<ajz, a= 


we have 


fi (th + &Pr) SF (tn) ——> || Pa |?+ @?L || pn |? (6.31) 
with O<a<a}. Therefore 


F (te+px)<F (zx)—2l| Pal? (G—aL), O<a<ah. (6.32) 
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If now 


1 
— e—- > 
O<ax<a,, OG, =—min Ms, Oh, =| , (6.33) 


then 
F (x, + app) < F (rq) — @ || De IP © 


It follows immediately that inequality (6.30) holds if 
Ap >+ Op (6.34) 


after a finite number of reductions starting with unity. 

It follows immediately from (6.30) that a,]|| p, ||? — 0. This means 
that || pz, || ~ 0. Indeed, aj 2y>O0 since by (6.27) || p (x) | 
has an upper bound in Q. But it follows from (6.33), (6.34) that a, 
also has a lower bound, a certain positive constant. 

Thus, p, —O. Let now x, be a limit point of the sequence. With- 
out loss of generality, we can take that x, —z,. Moreover, since 


Un, iE ZF b (x,) are positive and their sum is equal to unity, We Call. 


setting uz = 0, i€ Js (z,), take that u;, >~u'* and a >0, their 
sum being 


> ui=1. (6.35) 
=} 
We rewrite now (6.27) and (6.26) for points zx, as follows 

Pat 2 unfi (tn) =0, 


un (fi (tn)> Pr) + fr (er) — Ba)) = 0, i=1,..., m, 
(fi (tx), Pa) + Hi (te) < Pa, tC To (zp). (6.36) 


It follows from the last inequality (6.36), if we choose i € J» (z,) 
such that hi (z,) = Ff (Zp), that 


B, = fi (tx) — K |l Da || = F (en) — K I De Il. 


But (6.28) shows that B,< F (z, —+| Dr ||*. Therefore B;, —> F (z,).- 
Taking the limit in (6.36) we obtain: 


2 u' fi (x,) =0, 


u’ (fi (2,4) —F (z,)) =0, ix=1,...,m, 


iM 
V 
© 


(6.37) 
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But these are just the necessary conditions for F (zx) to attain its 
minimum at point z, (see Chap. I). If f; (x) are convex, then these 
conditions will be at the same time sufficient and this proves the 
theorem. 

Let us now give a local estimate of the convergence of the algo- 
rithm. 

Theorem 6.5. Let x, be the minimum point of F (x) and functions 
fi (x) be twice continuously differentiable. Besides, let the gradients 
fi (tx), EE Fo (Ly), Where Fo (xy) = {ti fi (Te) = F (ry)}, be such 
that the differences 


fi (2) a ig (Lx), L FA Los Lg E J (ry) 
are linearly independent and the multipliers u‘ strictly greater than 
zero fori € Jog (xy) and (y, L" (zy, u) y) > O for all y 4 O. Here 


™ 
L(z,u)= >, u'f;(z) and L" (zx, u) 
i=1 
is the matrix of second derivatives with respect to x. Then with sufficiently 
small 6 > 0, and a > O there is a neighbourhood of point x, such that 
the process 


Trty = 2%, +ap(r,), K=O, 1,..., 


converges starting from any initial approximation x, of this region 
and || x, — x, || <Cq", where Ox q< 1. 

Proof. We shall give only the general scheme of the proof since 
the complete proof is quite analogous to the proof of theorem 5.4 
and in fact is reduced to it. 

If we take 


J (a) = {i € Is (a): (fi (x), p (x) + Fi (x) — B (x) = OF 
then it can be shown (see lemma 05.2) that with a small 6 we have 


Js (x) = Jo (te) = 4 (x) for all x close to x,. Therefore it fol- 
lows from (6.26) and (6.27) that vector p (x) and the corresponding 
Lagrange multipliers u‘ satisfy ‘the system of equations 


p(x)+ > u'fi(x)=0, 


1EeJ 0(Xx) 
(fi(z), p(x))+fi(t)=B(z), t€So(2,), 
>» w=1. (6.38) 
iS o(xx) 


Let i, be an index from J, (x,y) and 


fi (2) = fi (2) — fo (2), fo (2) = fio (2). 
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Then system (6.38) is equivalent to the following one: 


p(z)+fi(z)+ » ufi=0, 


i€ S 0(xx) 
(fi (x), p(t) thi (x) =0, t€ Fy (xe), 
Fo (te) = Iq (xu) \ {io}. (6.39) 


But this system is absolutely equivalent to system (5.51.1), (5.51.2). 
As the proof of theorem 5.4 was reduced to a study of the properties 
of p (z)—the solution of system (5.51.1), (9.91.2)—it follows that 
the further prooi of theorem 6.5 is simply reduced to checking the 
conditions of theorem 5.4. But it can be easily ascertained that the 
assumptions of theorem 6.5 provide completely for the fulfilment 


of the conditions of theorem 5.4 for functions f/; and this completes 
the proof of the theorem. 
The following theorem is an absolute analogue of theorem 9.9. 
Theorem 6.6. Let the conditions of theorem 6.5 be fulfilled and, 
besides, the number of indices in set Jo (ry) be equal ton + 1. In this 
case, with a small 6 the process 


Lpiy = Ly + p (zp) (6.40) 


converges at a quadratic rate to point zy. 
Proof. In the case under consideration, vector p (zx) is uniquely de- 
fined by the system of equations 


(fi (x), p(x) + fi (xz) =0, i € J (zx), 


since vectors fj (x), i € 4 o (v,) are linearly independent for z, 
close to xz, by the assumptions. But then process (6.40) is just New- 
ton’s method for solving the system of equations 


fi (x) = 0, i€ FY (Xx) (6.41) 


which by theorem 6.1 and remark 1 on p. 215, converges quadrat- 
ically in the neighbourhood of point x,. Note that point z,, satisfies 
(6.41), for f; (z,) = F (xy), i € Jo (zy) and therefore 


fi (te) = fi (te) — fio (te) = 0, 1 € Fo (x4). 


7. LOCAL ACCELERATION OF CONVERGENCE 


As was shown in Sec. 5, the linearization method, speaking gener- 
ally, converges at the rate of a geometric progression. In a number 
of problems this can prove insufficient and the problem arises of 
how to accelerate the convergence of the process. 
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In ‘this section we shall describe methods that permit to do it 
provided an approximation sufficiently close to the solution has 
been found. The last circumstance is a shortcoming of the process; 
there are, however, no methods at present that permit to construct 
a process, whatever the initial approximation, with an asymptoti- 
cally superlinear rate of convergence, this being achieved but in the 
problem of unconstrained minimization. 

The methods described below are based on the following idea. 
The minimization problem is reduced to a certain system of nonlin- 
ear equations and then Newton’s method or its modification is 
applied to the solving of this system. At the end of this section we 
shall describe a method that uses this idea directly, i.e. the neces- 
sary conditions for a minimum will be established and Newton's 
method applied to the solving of the equations obtained. Such a 
method has many shortcomings of which the principal one is the 
necessity of calculating second derivatives of the original functions. 
Therefore, this method can be applied only to problems in which 
such derivatives are easily calculated. 

A second method is based on the fact that point z, is the solution 
of minimization problem (5.1) only if it satisfies the equation 
Pp (xy) = 0, where vector p(x) is the solution of the subsidiary 
problem (5.4). We shall describe a method that permits to solve 
a system of nonlinear equations without calculating derivatives. 
As was mentioned above, this method will converge only from a suf- 
ficiently good initial approximation. 


Formulation of the Problem. 
Basic Formulas 


It is required to solve a system of equations 
p (x) = 0 (7.1) 
where p (z) is a vector with components p‘ (z),i=1, ..., n x € ET". 
Note that p(z) is an arbitrary vector-function that is not as yet 
connected with the problem of mathematical programming. 
Let z, be the solution of system (7.1). We shall always assume 
further on that p (x) is a vector-function which is differentiable in 
the neighbourhood of point x, and that the matrix of derivatives 


P (7) = { 7" 7 vee 


Ox) 


satisfies Lipschitz’ condition, i.e. 


lp (z) —p’ (YW ISL \lx— y ll 
where all the norms are Euclidean. 
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Further without loss of generality, we can take that z, — 0. We 


denote 
PO =A, w (x) = p (z) — Az, 


o(2, y) =7o—7 lP(e)—P)—A(e—Y)]. (7.2) 

Suppose that matrix A is nonsingular so that the following estimates 
hold: 

m || zx ||< [Az || <M |[z || (7.3) 


where M >m> 0. 
Lemma 7.1. The estimates 


lo (z) [|< Cy IlzlP, Ilo (, y) I] <C, max {|| z Il, lly I} 
hold. 
Proof. Let p* (x) be the gradient of function p' (zx). Then by 
Taylor’s formula we have 
p’ (x) = p* (0) + (p* (0), x) + (p* (z) — p* (0), 2) 
where z = Or, 0 < 0 <1. Using the fact that p* (0) = 0 and Lip- 
schitz’ condition for p’ (x), we obtain that 
| p* (z) — (p’ (0), z) || KL |x iP, i=4,..., 0. 
Hence, we have 
| @ (x) || = lp (z) — p’ (9) z I} < CG, |I x IP. 
Further 
p’ (y) = p* (x) + (p* (x), y — 2) + (pi (z) — pt (2), y — 2) 
where z = 02 + (1 — 9) y,0 <8 <1. Therefore 
p’ (y) — p' (x) — (p® (0), y — 2) 
= (pi (x) — pi (0), y — x) + (p® (z) — p* (2), y — 2). 
Hence after simple transformations (using Lipschitz’ condition), 
we obtain 
| p' (y) — p* (x) — (p* (0), y — 2) | 
<Li|c|ily—zrilt+tLilz—zlllly—cll 
=L \|ly—zi|l(z il + (4 — 8) lly —c I) 
<L\ly—-=z |i (2 — 8) lz |] + 4 — 8) [ly Il) 
< 3L |ly — x || max {|| z |I, Il y II}. 
The second statement of the lemma follows immediately from the 
last inequality. 


Let now points 2,, Z2, ..-., Z, be already constructed, p (z,) + 
~(0,k =1,..., n, e, be unit vectors in the direction of the k-th 


coordinate axis. 
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We introduce the notations: 
Yr = Tp + |l Pp (te) ll en, 
Th = Yr — Zp = || P (2x) ll en, 
Zp = D(Yn) —P(tr), AK=1, ..., Mn. 


We shall introduce a measure of the linear independence of an 
arbitrary set of vectors b,,4 = 1, ..., n. We take 


A (bi, -+++Bn)= min I> a; rll. 


»> la |=4 = 1 
1i=1 


It is easily seen that A(d,,...,6,)>>0 if and only if vectors 
b,,...,0, are linearly independent. Note also that 


1 
A (é4, He 19 On) =. 


Lemma 7.2. There is a neighbourhood of point xy = 0 such that 
A (2%, .- +) In) Sypr>Zd, 


provided x1, ..., Zn are in this region. 
Proof. By the definition of wo (z, y), we have 
Zp = Ar, + © (YR, Zp) || Tr || = || BP (tx) || (Aen + © (YR, Zp))- 
Therefore 
Zh An + (YR, Zr) 
llzr {| = [| Aer-- © (YR, xr) II" 


If Lp —> Q, then 


ok Aep 
——— -—}>>_ 
I] Zp || || Aex || ° 


However, it is easy to see that A(z, ...,2,) depends continuously 
on Z, || z, ||’. Therefore for z, sufficiently close to zero we have 


A (21, ..+52n) >> A (Aes, -.-5 Aen). 


But A (Ae, ..., Ae,) >O, for vectors Ae,, k=1,..., m are 
simply columns of matrix A, and since matrix A is nonsingular, 
its columns are linearly independent. Thus 


A (24, » ++) 2n) 5 A(Aey, «-., Aen) >0 
for all z, from the neighbourhood of xz, = 0. Q.E.D. 


Let 6 > 0 be the radius of the region about zero in which lem- 
mas 7.1 and 7.2 hold. Let points z,, . . ., x, be chosen in this region. 
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Let us find quantities B;, i= 1, ..., m from the system of equa- 
tions 


— P(tn)= >) Brzn. (7.4) 
k=} 


By lemma 7.2, this system is solvable. We take 


Ln+4 =tn+ db Barr: (7.9) 


Let us estimate the norm of 27,4,. Since 


Pp (Lp) — Arn + @ (Zn) 
Z, = Ar, + @ (Yn, Xp) \I Tr Il, (7.6) 
we obtain from (7.4) that 


— Az, — 0 (tm) = 2 BrAret 2 Bao Yas ta) [Ira | 


or 


AZn+4= —O(L,)— 2 Bro (yr, Zr) || Ta ||- 


It follows from the last equality that 


ml] Zn44 <I] Atmel < [1 (rn) [I+ >} [Ba Ira llll@ Was 24) Il. (7-7) 


But by lemma 7.1 


lo (Ya, Te) || <= C, max {|| ye Il, ll te II} 
= C, max {|| (tp + || p (zx) Ilex) Il Ll te ID 
< C, (ll Zn Il + Il p (te) II). 
In the region under consideration 


Il p (zp) || = |] Azy + © (xp) || < M (zp) + Cy |I 22 II. 
Therefore 
ll © (Yr, Ze) | SC, (1 + M + C46) || Lr | = C3 || Zp Il- 
Using this inequality we can rewrite estimate (7.7) in the following 


form: 


{ 7 
ental < S| Call enll@+Co max | cel] D) [Bel lirall}- (7-8) 
ES 1) h=1 
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Note now that 


| > Ba Zp |= 5 Br |l2n {l= 2h __ 


h=1 > |elilaa 


Pca) 


SA (21, «+4 2n) (>) [Bx Il ze). 
k=1 


Taking into account (7.4), we obtain 


| P (en) A (at, «+ +5 zn) CQ [Ba [Il 2a I) (7.9) 


Further 
|| 2a |] =I] (Ark +0 (Yrs Zr) || Te II) 
= || Arp I—I]r lo (Yns 2x) |] SH] rx || (m— Cs] ze |) 
> || rr || (rn — Can max ax || z ll). 


Therefore 


>) | Bal ll ze = C 0 [Bal ll ra |) @@— Cs max || zx |). 
k=1 hk=1 4<k<n 
If 
mae 40 
ymax || Lk I< Cz? (7 ) 


then inequality (7.9) yields now 
- Ip (en) I 
D | Belli IN Dera m= Cg max ep) 
k=1 i<k<n 


Taking into account that 


IP (tn) Il = [| At, + © (ap) || <M |] 2 |] + Cy Il an II? 
SC, | zn II 


and also (7.11) and lemma 7.2,! we can finally rewrite (7.8) as fol- 
lows: 


C3 max ||zp|]C, 


1 <n 
ll 2n44 [<I] Zn || — C, || Ln iste 7 . (7.12) 


y (m—C3 max || zp ||) 
imk<n 


= — = 


We formulate now the result obtained in the form of a lemma. 
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Lemma 7.3. If points x,, k = 1, ..., n are chosen in a region 
about point x, = 0 such that the conditions of lemmas 7.1 and 7.2 
and inequality (7.10) are satisfied, then estimate (7.12) holds true. 


Algorithm 


We formulate now the algorithm for solving the system of equa- 
tions (7.1). 
Choose initial points z,, z,, ..-., Z,. Let points Tyy +++) Tay. 
-» t, be already constructed. Then point z,,, is constructed by 
the following formula: 


NY 


a 2 i 2, BF Vh—nti (7.13) 


> | 


where 
rj = || p (zy) llemiy, yy = TR + Ty, 25 = P (Ys) — P (25) 


and the quantities Bi, i=i1,..., m are determined from the 
system of equations 


— P(&r) = 2 BiZk—n+i- (7.14) 


Index m(j) is calculated by the following rule: if j = n+ p, 
1{< p<xn—1, where / is an integer, then m(j) = p; and if] = 
= in, then m (j) -- n. 

Thus vectors r,, To, ---, TR are proportional to unit vectors of 
the coordinate axes which are taken in cyclic order. 

It can be seen from the above formulas that the scheme of the 
algorithm is simple enough. At every step it comprises the calculation 
of p (x) at points z, and y;, and the solving of the system of equa- 
tions (7.14). 

Theorem 7.1. Let 6, > 0 be such that for all x satisfying the in- 
equality || zx || < 6, the conditions of lemmas 7.1 and 7.2 are fulfilled 
and moreover, the following tu:o inequalities: 


m—C;||z||>>, (7.15) 
6 2C2C 
| Cy +S | <1. (7.16) 
Lel 2, ..+, tm be chosen such that || x, || <5), k= 1, ..., n. 


Then the algorithm described above converges to the solution LX xp of equa- 
tions (7.1) at a superlinear rate. 

Proof. First, we show that || z, || < 5, for all points z, construc- 
ted by the algorithm. Indeed, if z,, ..., 2, are in the 6,-region 
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about point z2,, then the conditions of lemma 7.3 are satisfied for 
points 2, -n+1, Trento, ---, 2, and therefore the following ine- 
quality analogous to inequality (7.12) holds: 


I] Ze42 ISH] Ze |] —] C1 |] Ze [+ 


Hence it follows, taking into account (7.15), that 


{<i1< 
y(m—Cgz max || rtr_-n4i |I) 
{<i<n 


I| Lesa I<] ze || max Il Zn-nsi || Cs (7.17) 
<1< Nn 
where 
4 
s=—- (C+ S]. 
But || tR_-n+;: || <5, by assumption. Therefore 


Il Tata Il < Ite Il SoCs < |] z, I] <4 


Q.i.D. Moreover, it follows from the last inequality, with the 
nolation gy == 5)Cs, that 


ll Zeta Il < o II Ze I. (7.18) 


Since by (7.16) gg <1, we have the estimate || z, || < q%~™ || zn ||, 
i.e. 2, — 0, and this proves the first statement of the theorem. 
Further, from (7.17) we obtain that 


Mtrtll —o, MAX | Tp-n+i ||- (7.19) 
Ite ll ~ P yeie 
Since zx, — OU, the estimate means that 
l] thas | 
ll zn || 


The last relation shows that 2, —O at a faster rate than that of any 
geometric progression. The theorem is proved. 
We shall give now the more precise bounds on the rate of conver- 


—» (. 


gence. We set vz, = Cs || z, ||. Then (7.17) can be written in the 
form 
Un+1<— UR MAX Ve_nsi- (7.20) 
{i<i<n 
Let us take now vj = max v;, j =1,..., m and determine v;, 
{isn 


k > n by the following recursive formula: 


Vr+1 = Va MAX Ve-nti- (7.21) 
{<isn 


It is now easy to see that v, < v; for all k. Further, since 
vu, = Cs [lai ll SCs) =O <1, t=—1,-.., 2, 
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we have v; <q, <1, i= 1, ..., m, and therefore sequence {v;,} 
decreases monotonically. This fact can be proved in an elementary 


way by induction on k. It follows that max v,_,4; = Vg_ni, and 
1<i<n 


(7.21) can be rewritten in the form 


, Zt Fe 


Vet+1 — VRVR-n+1- (7.22) 
We denote w, = In vz. Then 


Wri) = Wp + Wa-rnty KEN, 
w, = Inv,, k=1,..., n. (7.23) 


It follows from Ostrowski’s results (his theorems 12.1 and 12.2) that 


—t —+> hy (7.24) 

where A, is the greatest positive root of the equation | 
p (A) = A" — A" 1— 1 = 0. (7.25) 
As g (1) = —1 <Q, and for great A, o (A) > 0, we have A, > 1. 


It follows from (7.24) that for any e>0, 4,>—e>1 there is 


a number /#(e) such that pt She or oo Net =Ay—e. AS 
Vk 


In Vv, < 0 (Ur<o< 1), we have for k>k(e) 


In Vass (Ap —€) In vy = In v 


or Vravcve®, k>k(e). It follows from the last formula that 
Va <VRo— OHM), But the sequence {v,} decreases monotonically 
and v,<q)<1. Therefore 
Va < qro-M@—") ek (e). (7.26) 
Theorem 7.2. If the conditions of theorem 7.1 are fulfilled, then for 
every & > 0, Ay — & > 1, where i, is the greatest root of the equation 
nN” — A"™-1 — 1 = O, there isa number k (e) such that for allk > k (e) 
the inequality 
za ahe-O8O”",y<t (7.27) 


holds. 


Proof. Recall that vz = Cs || zp ||, UR < < Up. These inequalities 
and (7.26) yield immediately the result required. 
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Computational Aspects. 
Application to the Problem 
of Mathematical Programming 


The algorithm described in the preceding subsection is simple 
enough. It requires at every step the calculation of vector p (zx) at 
points x, and y, and the solving of the system of linear equatians 
(7.14). If we denote by Z, the matrix whose columns are 2;-,4;, 


i=i1,..., n, then equations (7.14) can be rewritten in the form 
Zr 6" = —p (xp), where B* is a column-vector whose components 
are Bj, i = 1, ..., 7. 


It follows from the algorithm that matrices Z, and Z;-., differ 
enly by one column: column 2,4, is substituted for column z, and 
Zp_ntit1, En — 1, for 2,_n+;, 1 < nm —1. Therefore to calculate 


7?', one can proceed as in the subsections on pp. 76 and 79. 
Note that the procedure leads to the accumulation of calculation 
errors. Therefore, if the calculation of p (y;) requires considerably 
more operations than the solving of system (7.14), the standard 
program for solving a system of linear equations should be used 
for calculating ®” rather than using recursive formulas. 

We now turn again to the problem 5.1-5.2 discussed in Sec. 5. 
According to lemma 9.1, in order to find a local minimum it suffices 
to solve the equation p (x) = O, where p (z) is the solution of prob- 
Iem (0.4). If the assumptions of theorem 5.4 hold, then by lemmas 
0.2, 0.4 in a sufficiently small region about the solution x,, the con- 
ditions of theorem 7.1 are also satisfied. Therefore, the application 
of the algorithm described in this section makes it possible to accel- 
erate the convergence of the linearization method. In applying 
this method one should use as p (zx) the vector that is the solution of 
problem 5.4. 


Minimization Problem 
with Equality Constraints 


Let us consider the problem of minimization of function f, (zx) 
with constraints 


fi(z) <0, i=1,..., m. (7.28) 


Let 2, be the solution of the problem and the following assumptions 
hold. 

(a) Functions f; (x) are twice continuously differentiable and 
their second derivatives satisfy Lipschitz’ condition. 

(b) At point z, the gradients fj; (z7,), i = 1, ..., m are linearly 
independent so that the necessary conditions for a minimum at zy, 
are satisfied in their regular form (see Chap. I, Sec. 4). Thus, there are 
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Lagrange multipliers u*, i = 1, ..., m such that 


folts)+ 2 u'fi(z,) =0, 
fi(te) = 0, i=1, ..., m. (7.29) 


(c) The sufficient conditions for a local minimum, ieee. 
(y, L" (zy, u) y) > O, hold if yO and (fi (zy), y) = 90, i 
—1,..., m. Here L(z, u) = f, (x) + » uf; (x) and L” (x, u) 


is the matrix of second derivatives of L (z, w) with respect to z. 
Theorem 7.3. Let the above conditions (a), (b), (c) be fulfilled. Then 


sequences {2;}, {un}, i=i1,...,m, k=0,1,..., calculated 
by the following recursive formulas 


L" (Xny Un) Pat 2 Aunfi (ta) +L! (tn, un) = 0 


(fi (xp), Pr) + fi (Lp) — Q, t= {, - 2 eg M, (7.30) 
Zhiy = Th + Pr 
Wi = ui, -L A uz, i=1, ..., m, (7.31) 


converge to xy, and u' respectively at a quadratic rate, whatever the 
initial approximation Zo, Ui, i= 1, ..., m sufficiently close to the 
Solution ry, u', i = 4, m. 

Proof. The process defined by formulas (7. 30), (7.31) is simply the 
one generated by Newton's method when it is applied to system (7.29). 
Therefore, in order to prove the theorem it suffices to check, using 
remark 1 0n p. 210, that the matrix of the first derivatives of the left- 
hand sides of (7.29) with respect to all z and uw’ is nonsingular. 

If we denote by f’(x) a matrix whose rows are fj (v7), i =1,..., m, 
then it is easy lo see that the matrix of the first derivatives of the 
left-hand sides of (7.29) has the form of the following block: 


(* (z,, u) f'* al n-tm. 


f (2.) 0 
n+m 


In order to ascertain that this matrix is nonsingular, it is sufficient 
to show that the homogeneous system of equations 


L" (tg, u) y + f'* (ry) u = 0, 
f’ (ts) ¥ =0 (7.52) 
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has only a zero solution. In this system, y € Z”, &@ is a vector whose 
components are uv’, i=1, ..., m. Let y, a be the solution of system 
(7.32). Performing scalar multiplication of the first of equations 
(7.32) by 7, we obtain on the basis of the second equation that 


(Hy L" (4s U) G) +, 1 (wx) 
=f, LT (ta, W) 1) +f (re) GW) = (ys L" (Tey u) 9) = 0. 


But by assumption (c), the last expression shows that y = 0. 
Therefore, the first of relations (7.32) can be rewritten as follows: 

me 

f'* (t,)u= 2 u'fi (z,) = 9, 

1— 
and this is possible only if au’ = 0, i=1,..., m since vectors 
fi (x) are linearly independent by assumption (b). 

Thus we have shown that the conditions of convergence of New- 

ton’s method are fulfilled and consequently the theorem is proved. 


6. METHOD OF PENALTY FUNCTIONS 


The method of penalty functions is one of the simplest and widely 
known of the methods for solving the problem of mathematical 
programming. The basic idea of the method consists in approxi- 
mately reducing the constrained minimization problem to the uncon- 
strained minimization of a certain function. The subsidiary function 
is chosen so that it coincides with the function to be minimized in 
the admissible domain and increases steeply outside it. 

Suppose we study the problem of minimization of function f, (z), 
xz€ EE” with constraints 


fi(z) <0, i=, ..., m. (8.1) 


All the functions f; (x), i = 0, 1, ..., m are continuous. 
We introduce the notations 


t?, t>0, t, t>0, 3 9 
Let us compose a function 
(zr) =r XY Go(fi (2)). (8.3) 


It is easy to see thal 
w(z,r)=0, rEQ 
where 
Q = {z:f; (x7) <0, i-=1, ..., m}. 
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If x € Q, then w(z, r) >O and (z, r) >-+00 as r > -+oo. The 
subsidiary problem is now the minimization of the function 


F (x, r) = fo (z) +P (@, 7). (3.4) 


It is natural to expect that the solution of this problem z (r) will 
be close to the solution of the original one. The precise conditions 
under which this fact will be realized are formulated below. 

Note that the choice of function w (z, r) in the way it was done 
above is not the only one permissible. It is sufficient for the function 
to have certain general properties that ensure the convergence of 
the method. The method has different properties depending on the way 
of choosing functions » (z, r). In particular, if we set 


 (z, Tr) = r max (f; (2), 


then the linearization method described in Sec. 5 can be taken as 
suitable for the minimization of function (8.4). In this case, as was 
shown in Sec. 5, it is not necessary that r should tend to infinity. 
However, F (zx, r) will not be a smooth function. 

In the general case, F (z, r) is constructed so as to be smooth and 
make it possible to apply one of the methods of Chap. II, which 
converge at a fast rate. Unfortunately in this case, r must tend 
to infinity and this fact involves a number of implicit difficulties 
which, in the opinion of the authors, considerably impair the value 
of the penalty function method. We shall discuss below these diffi- 
culties and give a cursory description of one more method whose 
idea is close to that of the penalty function method. 


Substantiation of the Penalty Function Method 


Let a certain continuous function % (z, r) have the following prop- 
erties: 
(1) (z, r) =O if x €Q, (a, r) > O, x € Q, and (zz, rR) > 
—» -+- oo if Lh —> £0; Ly E Q, rp —+> -+©o; 
(2) w (xz, r) increases monotonically with increasing r. 
Theorem 8.1. Let the set 
Qo (r) = {z: F(a, r)< CH, 
F (x, r) = fo (2) + (2, 7) 
be compact. Then function F (zx, r) assumes its minimum m (r) for all 
x at acertain point x (r) and m (r) < m, where 
m=minfyo(z), m(r)>m 
xEQ 
and m/(r) increases monotonically with increasing r. Moreover, if 


x (rp) 2X, k > ~w, rp > 00, then zy is the solution of the original 
problem. 
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Proof. Let x be a point of Q, C = f, (x). Then set Q of those z € Q 
for which fy (x) < C is a closed subset of compact set {2 (r). Indeed, 


for z € Q by the properties of wp (x, r) we have 
fo (z) +P (z, r) = fy (a) <C, 


i.e. x € Q, (r). But it is clear that the minimum of f, (z) in & must 


be in subset Q. Therefore, we should seek the minimum of contin- 


uous function f, (x) in compact set 2. Since the continuous function 
attains its minimum in a compact set, it follows that problem (8.1) 
is solvable. An analogous reasoning shows that function F (z, r) 
assumes its minimum m (r) at a certain point z (r). 

Let z, be a certain point of the minimum of f, (x) in Q. Then 


F (xy, T) = fo (Zu) + YP (Ty, 7) = fo (%q)- 

for zy € Q and »w (z, r) = O with z € Q. Therefore 
min F (z, r)=m(r)<m, 
i.e. x (r) E€ Q,, (r). 
Consider now the sets 

Qn (7) = {x: fo (x) + p (z, rT) < fo (Xy)}- 
These sets are compact by assumption, and because of the increase 
of (x, r) with increasing r, we have 

Qn (Te) < Qy (M1), Tr < Te. 


Let now {r;}, k oo be an increasing sequence of r and r, —> +oo. 
Then 


Qn (Tx) Cc Qin (r,). 
Since as was shown above z (r) € Q,, (r), all points x (r,) belong 
to compact set Q,, (r,). Therefore without loss of generality, we can 
take that sequence {x (r,)} converges to a certain point Zp. 


Let us show that z, € Q and fy (z)) = m. Indeed, if x) € 2, then 
w(x (rz), Tk) -~+oo and consequently F (x (r,), rx) ~ +oo, for 


fo (2 (Tr)) = min fy (2). 
xEQm(T 1) 
But this contradicts the fact that F (x (rx), rx) = m(rp) << m. 


Thus 29 € &. 
Further, 


m (rp) = F (x (rr), Tr) = fo (@ (TR)) +: (2 (Tr), Tr) 
< fo (Zo) +P (Lo, Tr) = fo (Zo): 
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Hence 


lim m (r,) = lim (fo (x (Tx)) + p(x (TR), Tr)) < fo (Zo). (8-9) 


But f, (x (rx)) fo (70). Therefore 
lim p (x (Tr), Tr) < fo (Lo) — lim fo (x (r,)) = 9. 


As (xz, r) > O, it follows that 
lim  (z (rx), Ta) = 0. 
Thus, 7 
lim m (Tr) = him fo (X (Tr)) + Lim p (x (Tr), Tr) = fo (2%) 2m. 


On the other hand, m (r;) < m. Therefore lim m (r;,) = fp (%>) < m. 
k-0o 


Comparing the last inequality with the preceding one, we see that 
lim m (rz) = fo (4p) = m, and this completes the proof of the 
h->0o 

theorem. 

The theorem proved shows that the substitution of the minimiza- 
tion of function F (z, r) for the solving of problem (8.1) with great r 
permits us to come nearer to the solution of the original problem. 
Let us estimate the character of this convergence for the nonconvex 
case. The problem of convex programming will be studied in the 
next subsection. 

Theorem 8.2. Let functions f; (xz), i =O, 1, ..., m be con- 
tinuously differentiable and the conditions of theorem 8.1 be satisfied. 
Moreover, we take that: 

(1) problem (8.1) has a unique solution, 

(2) function (xz, r) is chosen in form (8.3) and the minimum of 
F (z, r) is attained at a unique point x (r) with great r, 

(3) inthe solution x, of problem (8.1) the gradients f; (ry), i € Jo(Zx) 
are linearly independent and! 


Jo (te) = {i: fi (ty) = 0, i = 1, ..., mb. 


Then lim x (r) = zy and 


T->00 


lim rey (fi(e (7) =s-ut, i=1,...,m 


where u' are Lagrange multipliers of problem (8.1). 
Remark. Recall that according to theorem 4.1 (Chap. I) the neces- 
sary conditions for a minimum are fulfilled at point z, in the follow- 
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ing form: 
fo (2x) + 2 u'fi (t4) =0, 
u' > 0, ~u'f; (zz) = 0, i=1,..., m. (8.6) 


Proof. First, we shall show that z (r) ~z,. Suppose that the 
opposite holds. Then there is a sequence 1, — 00 such that 
| x (rx) — zy || 2 5, > O. Since in proving theorem 8.1 it was 
shown that z (r,) € om (r,) and set Q,, (7,) is compact, we can take, 
without luss of generality (if required we can take a subsequence), 
that x (r,) >2yy and clearly || 7s, — zy || 2 5, =O. However, 
it follows from theorem 8.1 that zr, is the solution of problem (8.1). 
Thus we have obtained two different solutions of problem (8.1). 
But this contradicts the assumption. 

Thus, it has been shown that zx (r) +~2z,. We turn to the proof 
of the second statement of the theorem. As z (r) is the minimum point 
of function F (z, r), at this point the gradient of the function 


F(z, r)=fo(z)+ r 2 Po (fi (X)) 


must be equal to zero. Simple calculations show that we come to 
the equality 


F(£(), 1) =fo(@) +. Cres (fi (@)) fi (@(P) =0 
or, with the notation 


u ( r) = 2rq@, (fi (zx (r))), 
to the equality 


fo(@(r)) + 3 u(r) fi (@(%)) =0. (8.7) 
Note now that as x (r) > 2, we have f; (x (r)) <0 for i E J» (zx) 
since f; (v4) <. 0, i € Jp (zy). Therefore with great r 
wu‘ (r) = 2rq, (fi (x (7))) = 9, iE Do (zx). 


But u'f; (zz) = 0 and therefore for i€ J, (14) we have ut = 0. 
Thus taking into account the expression for u‘ (r), the statement of 
the theorem is proved for i € Jy (zq). 

Due to the foregoing, we can rewrite (8.7) and (8.6) as follows: 


fo (z(r)) + 2 u' (r) f, (x (r)) == 0 


i¢d 0(x%) 
fo (4) + » u' fi (2,) = 0. (8.8) 
iC eS o(xx) 
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If we take into account that zx (r) > z,, fj (x) are continuous for 
all x and fj; (zy), i € Fo (zy) are linearly independent, then it is 
easy to conclude from (8.8) that wu‘ (r) u' and this, with account 
taken of the expression for wu’ (r), completes the proof of the theorem. 

Remark. It follows from theorem 8.2 that if ut > 0, then with 
great r functions f; (x (r)) are strictly greater than zero and f; (x (r)) 
tends to zero at the same rate as quantity u'r! does. Thus the approx- 
imate solution will always violate the constraint /f; (xz) <0 if 
u' > 0. 


Convex Programming 


In the case of the problem of convex programming, the estimates 

of the approximation of zx (r) to the sought solution xz, can be made 
more precise. 
Theorem 8.3. Let all the functions f,; (x), i = 0, 1,..., m be 
convex, the conditions of theorem 8.1 for function % (x, r) in form (8.3) 
be satisfied and, besides, the necessary conditions in the form of the 
Kuhn-Tucker theorem be fulfilled at point xy which is the solution of 
problem 8.1, i.e. there are numbers u‘ > 0 such that 


fo (t4)< Dd) u'fi(z) + fo(z), for all z, 


u'f;(x,) =0, i=1,...,m. (8.9) 

Then 
fi(zin<, if filer) 0, (8.10) 
fo (2 (1) > fa (44) - > (6.11) 

where 


Proof. We introduce the notation 4, (x) = {i: fi; (7) >0,i=—1,... 
..., m}. As 


mm 
i= 


F(x (r), 7) = fo(@(7)) +7 21) Po (fi (2 (7))) So (2); 


it follows from (8.9) that 


fo(w(r)) +r 2 Golf (@)<folzr) + DY uw’ eM) 
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or 
r Di Go (fe (@ (<Q wh (@ (7). 


But for i€ J, (zx (r)) 
Po (fi (z (vr) = 90, fi (x (r)) <9, 

and Qo (f; (x (r))) = fi (x (r)) for i € Ay (x (r)). Therefore the inequal- 
ity obtained may be made stronger: 

r DS REM< D whle~Ms<uf Y Ke). 

i€eS 0 (x(r)) ic 0(x(r)) i€eS o(x(r)) 

In deriving the last inequality we used the well known Cauchy- 
Buniakowski inequality. 

Thus 

fi(a(r))< 3 RemM<+ (3.12) 
ic 9 (x(r)) 


and it follows that (8.10) holds. 
Further, for all zx 


fo (4%) So (x) + u'fi (£)< fo (x) + >) u' fi (x) 


i=1 ic Jo(x) 


=pe+r Sy R@- Y (Venw—s4)’ 


ico (x) ico (x) 


i)2 ~ m2 ye 
+ > OY <p@+r > wlfi@)+Z=F N+E- 
ico (x) i= 
In writing this formula it was taken into account that 
di Po(fi(z))= Dy fi (zs (8.13) 
= ic J o(x) 
by the definition of gp) (z) and 4, (x). Thus 
u2 
fo (Zs) <fo(2(r)) +1 Dy Go (fi (2 0) +3 


i=1 
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But it follows from (8.12) and (8.13) that 


S\ 0 (fi (2 (7) <5. 


i=| 
Therefore 
5 ue 
fo (@(7)) 2 fo (4a) —Z — , 


Q.E.D. 


Computational Aspects 


The methods expounded above reduce problem (8.1) to the mini- 
mization of function F (z, r). It is possible now, in order to obtain 
an approximate solution, to use one of the methods described in 
Chap. II. The following specific circumstances should be, however, 
taken into account. If functions f;(z) are not convex, then func- 
tion F (z, r) is also not convex with respect to x. Therefore, it can 
have local minima while in all the preceding text it was assumed 
that we were determining the global minimum xz (r). 

As all the methods of Chap. II are meant for finding of a local 
minimum, if the function to be minimized is not convex, it is the 
local minimum that will be found if the initial approximation is 
poor. This affects the convergence and is an important shortcoming 
of the penalty function method in its application to nonconvex 
problems. 

If the problem under consideration is one of convex programming 
with the use of function (8.3) as  (z, r), then it is easily ascertained 
that F (z, r) is convex too; therefore, the difficulty mentioned above 
is removed. However, another difficulty arises. The fact is that one 
should take r sufficiently great in order to obtain a good approxima- 
tion; this follows from the estimates obtained above. In this case 
all derivatives of F (z, r) with respect to zx will also be great, for 
they are proportional to r. But it was established in analysing all 
methods described in Chap. II whose rate of convergence is superlin- 
ear that the size of the region in which the rate of convergence be- 
comes superlinear is in inverse proportion to Lipschitz’ constant of 
second derivatives, i.e. in the case under consideration this region 
will also be small and even a method which, at the limit, theoreti- 
cally converges rapidly can become ineffective. Moreover, as func- 
tion @, (¢) with ¢ = O has no second derivative, F (x, r) calculated 
by formula (8.3) with the use of wp (z, r) will also have no second 
derivatives at points z for which f; (xz) = 0 for a certain i. But if 
the solution x, lies on the boundary of the domain, it is this case that 
will take place. On the other hand, all methods which converge 
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at a fast rate require that the function being minimized have second 
derivatives at least in a certain region about! the point sought. 

All of the difficulties mentioned are, as a rule, observed in calcu- 
lations in practice and this lowers the effectiveness of the method. 


Fiacco and McCormick Method 


This method is practically applicable only to problems of convex 
programming. It is based on an idea close to that of the penalty 
function method; however, in this method the approximations 
approach the solution from inside the domain and not from outside 
as in the penalty function method. 

Let us again consider problem (3.1). Suppose that all the func- 


tions f; (x) are convex and there is a point z such that f; (x) < Q, 
i=1,...,m, so that the interior of the admissible set Q is non- 
empty. We compose the function 


P(x, r)=fo(x)— > Fay , r>0 


defined inside set 0&2. It is easily ascertained that P (z, r) is convex 
with respect to z inside Q. If we denote by z (r) the minimum point 
of P (xz, r) inQ, then with sufficiently general assumptions analogous. 
to those of theorems 8.1 and 8.2 it can be shown that 


lim z(r)=2z,, 
r+-+0 


lim ——— ; 
r+o ff (z(r)) 


Thus, the approximate method of solving problem (8.1) again has 
been reduced to the problem of unconstrained minimization of 
function P (z, r). 

Of the specific traits of this subsidiary problem the same things 
can be said that were said of the penalty function method in the 
subsection on p. 242. In order to illustrate these traits and to show 
why even effective methods of minimization of F (zx, r) or P (z, r) 
may fail to provide for a fast rate of convergence, we shall ad- 
duce a simple example. 

Let fy (x) = —z, f, (v1) = 2, c€ E’'," i.e. we are solving the problem 
of minimization of —zx subject to x <0. The obvious solution 
is rz, = 0: 


P(x, r)= —r——., 
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Equating to zero the derivative of P (z, r) with respect to x we 
obtain 


P’ (z, r)= —1+->=0. (8.14) 


Hence x (r) = --/r. Letusnow apply to the solving of (8.14) a meth- 
od that converges at a quadratic rate—Newton’s method, i.e. 
obtain approximations by the formula 
__ Pp’ (Zp, r) 
LR4+4 = Lp — Piz, 7)” 


Substituting expressions for P’(z, r) and P" (z, r) we obtain after 
simple transformations 


y) - - 
Vey = 2 yf, V»=2,+Vr. (3.15) 
It is clear from formula (8.15) that the deviation of z, from the 


solution zx (r) = —\Vr tends to zero monotonically only with ini- 
tial points such that 


2 2¥ roa Zp vp | <1. 
As z, <0 (the approximation is sought in the region r<0), we 


have 
| ; 
Vi l<it, [rl <Vr. 


Thus the last formula shows that a quadratic rate of convergence of 
Newton's method will be guaranteed only in a domain such that in 


it x, deviates from the solution by not more than )r, i.e. the domain 
of convergence of Newton’s method tends to zero with decreasing r, 
and the size of this domain is of the order of magnitude of the devia- 
tion of x (r) from the true solution of the problem—z,. This indicates 
that the greater amount of calculation work will be required to 
hit the region of convergence of Newton’s method, while in cases 
where Newton’s method has a good convergence it is no more neces- 
sary as the approximation obtained deviates from zx, by as much 
as ‘x (r) does. 


9. PROJECTION METHODS 
WITH RESTORATION OF TIES 


Construction of the Methods 


Consider the problem of minimization of function f, (zx) with 
the following conditions: 


fi(z) =0, i=1,...,m, m<n. (9.4) 
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We set g = (f,,-- +) tm), Sg = {z: g(x) = 0} and suppose that 
all functions fy (z), f; (t), .--, fm (xz) are continuously differentiable 
and Sg is a smooth manifold ((m — m) dimensional), i.e. that 
at any point x€ S, the rank of matrix g’ (z) is equal to m (g’ (x) = 


={ »t=1,...,m, Jj=i1,..., n, i is the row index). 


Consequently, at any point LES g a hyperplane tangent to S, can be 
constructed: 


g’ (x(x — xz) = O. (9.2) 
Further on, we denote this hyperplane (i.e. the set of points that 
satisfy equation (9.2)) by T (z). 

One possible approach to the construction of iterative processes 
for solving the problem formulated is based on the following consid- 
erations. 

Let zx, be an arbitrary point of S, such that the gradient f, (x9) 
is not orthogonal to the hyperplane 7 (z,) (i.e. the necessary con- 
dition for an extremum of function f, (x) on the manifold So is not 
fulfilled at point z,). Then in plane 7 (z,) there are infinitely many 
directions of descent of f, (x) (i.e. there are infinitely many direc- 
tions z — x, which belong to 7 (z,) and such that (f, (%)), z — Zp) < 
< 0). Suppose we have determined one of these directions vy = 
== Ip—Z_ and constructed point Ly (a)—=2_ + av, such that f9(29(a)) < 


< fo (Xo). Point Zo no longer satisfies equations of conditions (9.1). 
However, if the value of parameter @ is sufficiently small (the quan- 


tity || z9 — zp (a) || is small), then using point zx, (a) we can con- 
struct in several ways point z, € S, such that 


fo (21) << fo (Zp). (9.3) 


This statement is based on the fact that we can choose a point z, (a) 
on the smooth manifold S, (and not a single one) so that the follow- 
ing condition be fulfilled: 


ry (a) — Xp = Lo (a) — ZL + Wy (a) 


IL Gry (%) |] = |] ey (@) — Zo (&) |] = 6 (| Zo (%) — Zp II). (9-4) 
(This can be proved strictly by using the theorem of mapping on one 
another of the region about point z) in manifold Sg, and in the tan- 
gent manifold T (z,); the theorem holds in space £”; see L. A. Luster- 
nik and V. I. Sobolev.) 
If (9.4) holds, we have, since f, (x) is differentiable, 
fo (21) = fo (Zo) + (fo (20), 1 — Xo) + 0 (| xz — Lo ll) 
= fo (Zo) + (fo (ro), Xo — 2%) + 0 (|| Lo — Zo ||) 

+ (fe (Xo), ty — x9) + 0 All ty — Ig l) 

= fo (Zo) + (fo (%o)s Lo — Xo) + % (II Xo — Zo ||). 


where 
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Hence if parameter @ is sufficiently small, the inequality (9.3) is 
satisfied. 

By constructing point x, € S, at which condition (9.3) is fulfilled, 
we have performed in essence an iteration of a certain process of 
descent for constructing successive approximations to the solution. 
Thus the k-th iteration of the process of the type being described 
consists in the following. 


1. The direction of descent v, = x, — x, of function f, (x) in the 
tangent hyperplane 7 (z,) is determined. 


2. A step of definite length is made in the direction v,: 2, (@)= 


= Zp + QV, (so that fy (x) < fo (Tx))- 

3. Using point zx, (a), point z7,4,€S, is determined such that 
condition fy (%_+1) << fy (4) be satisfied. 

It is clear from the foregoing that we can choose for moving from 
point x, different directions of descent in plane 7 (z,). The choice 
of quantity a, and the final step of the iteration—construction 
of point z,,,—are also determined not uniquely. Performing in differ- 
ent ways each of the three stages of the iteration we can construct 
a whole class of processes of descent of the type described. 

Consider now several possible methods of choosing vector v,. We 
can take as vector v, the projection of the antigradient —/, (z;) 
onto the plane 7 (z,). The construction of such a vector is equiv- 
alent to the solving of the problem of minimization of function 


Fy, (2) =(f, (tn) t— 2a) +5 || @— 20 IP (9.5) 


provided that z€T (z,). Applying the method of Lagrange multi- 
pliers, we find that 


vy, = —(T — g'*(g'a'*) 8") fo (&n) (9.6) 


where g’ = g’ (z;). 

More effective projection methods with restoration of ties can 
be constructed by choosing as v,;, a vector that minimizes the func- 
tion 


Fr, (2) =(f5 (tn), ©—2n) + (5 (tn) (@— an), C—2,) (9.7) 


on plane T (z,) (there is such a vector if F; is a convex function). 
Since in this case the quadratic approximation to the function 
being minimized was practically used for the construction of the 
direction of motion, we shall call the methods, in which v, is con- 
structed in the way described, methods of the second order. 

Consider now the method of restoration of ties (the third stage 
of an iteration) which will be used in what follows. 

Let the system of eo::ations (9.1) in a certain region about any 
point x€ S, define the funclion y = y (z), where y is an m-dimension- 
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al vector of coordinates and z is an (m — m)-dimensional vector. 
Without loss of generality, we can take y = (z}, ..., 2”), z= 
= (¢™t4, xz"). By the theorem of implicit functions, it is 
necessary for the existence of function y (z) and its derivatives that 
at any point x € S, we have the determinant 


Jen (2)|=|{%}] 40, i, jade. me 0.8) 


In this case, point 7,4; = (Zp+,, Yr+1) can be constructed by the 
formulas 


Zhti = 2p + OpDrr Yr = Y (Zp+1) (9.9) 


where Pp, = 2, — 2, is the corresponding part of vector v,;. To con- 
struct sequence (9.9), there is no need in finding the explicit expres- 
sion of function y (z); it is sufficient to be able to evaluate it (i.e. to 
solve system (9.1)) with a fixed vector z. 

The construction of sequence (9.9) (besides the realization in 
this way of one of the possible methods of restoration of ties) can be 
considered in another respect—as an iterative process of minimi- 
zation of function @ (z) = fy (z, y (z)). Obviously, with condition 
(9.8) fulfilled, the minimization of ~ (z) is equivalent to the solving 
of the original problem. Vector p; (which is the direction of descent 
of function q (z)) can be considered in this case to be the solution 
of the problem of minimization of function wp, (z) = Fy, (2, yi (2)), 
where function F, (z, y) is determined by formula (9.5) or (9.7), and 
vector-function y;(z) is determined from the linearized equation 
of ties (i.e. from the equation of the tangent plane 7 (z;)) 


By (Zn) (Y — Yr) + Bz (Lp)(Z — 2x) = YD. 
Ilence 


yi (2) = Yn — By” (Zr) Bz (TR )(2 — Zn). 


The fact that vector p, determined by the methods described is the 
direction of descent of (z) follows from the equality w, (z,) = 
= @’ (2), where 


MP’ (Zn) = for (tra) + y'* (Zz) foy (Zp), 
f= (ses. BE) 
OZ \ jdamti® «+2 gQen]? 
O 0 
fou = (jars +++» Gam) 
y’ (Zn) = —8y* (Zp) Bz (TR). (9.10) 


Since the process of the (9.9) type can be considered to be a method 
of minimization of function @ (z), it is easy to see that vector ’ (Z;) 
can be taken as p;; then sequence (9. 9) will be the gradient method 
of minimization of 9 (2). 
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Note that vector p, providing the minimum of function F, (2, 
y, (z)), where F, (z, y) is defined by expression (9.5), is calculated 
by the following formula: 


Pr = —U + y"™ (2x) y’ (2n))* P'(2n)- (9.11) 


Consequently, sequence (9.9) in which p, is determined by formula 
(9.11) is also a method of the gradient type for the minimization 
of @ (z). Methods of the gradient type we shall call methods of the 
first order. 

Newton’s method and its modifications can be applied in principle 
to the minimization of function @ (z), provided certain necessary 
requirements are fulfilled. However, it should be noted that the 
calculation of the second derivative @” (z) is, as a rule, very labo- 
rious, for it requires the calculation of the second derivative of the 
vector-function y (2), i.e., practical’y, the calculation of the second 
derivatives of functions fh (x), > Im (2). 

Suppose that condition (9. 8) is not fulfilled and the following 
weaker requirement is satisfied: at any point x€ S, at least one 
determinant of order m is not zero 


| {= } | «0, (9.12) 


Ox) 
J=h)iy Joy + + +s Imo jel, ..., nl, i=1,..., m. 


The weakening of the requirements to functions f; is that at different 
points of set S, differentideterminants may be not zero. In this case, 
the coordinates of point xz € S, which form vector z and vector-func- 
tion y (z) can be, speaking generally, different at different points 
of manifold S,:z = (zm, ..., xn), y=(x1, ..., zm). Tak- 
ing this into account we can, as before, use formula (9.9) for the r esto- 
ration of ties. Each step of process (9.9) can be treated as a step of the 
process of minimization of a certain function g (x/m+# , ..., In) 
for which the corresponding vector p, is the direction of descent. 

Methods of the (9.9) type will be studied below. It will be con- 
venient to denote any vector-function (z4!, . . ., z7™) by y and a vec- 
tor of independent variables by z (as we did in fulfilling the con- 
dition (9.8)). Accordingly, any of the determinants |{0/;/0x’}| 
of m order will be denoted by | g, | and function fy (z, y (z)) by @ (z). 
The absolute value of function | g, (x) |shall be denoted by | g, (x) |,.- 

In the following two subsections we shall study the properties of 
the methods of the first and second order. In the fourth subsection 
we shall consider methods of dual and conjugate directions for 
the minimization of @(z) (or the algorithms based on methods 
of this type). From the viewpoint of practical computations just 
these algorithms are of the greatest interest. 
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Methods of the First Order 


We shall study the properties of methods based on the lineariza- 
tion of function f, (x) and ties f;,, i = 1, ..., m. 

Consider the algorithm whose every step is a step of the gradient 
method for minimization of a certain function 9 (z): 


Zt = Zh — An’ (Zn), Yatra = Y (2n41) (9.13) 


where z, is a vector corresponding to the determinant | g, (z) | which 
has at point x, € S, the maximum absolute value of all the determi- 
nants | gy |, the gradient ’ (z;) is calculated by formula (9.10) and 
parameter a, can be determined by one of the methods described 
in studying gradient methods (Sec. 1, Chap. II). We shall choose 
as @, the maximum value of the parameter obtained by successive 
reductions of a certain positive constant which satisfies 


fo (2, y (2)) — fo (Ze, Yr) S —eEa@ || M’ (Ze) [P, O<e<i (9.14) 


where z = z, — aq’ (z,) (this is an analogue of the method of choos- 
ing a, according to condition (1.2), Chap. II). 

Theorem 9.1. /f functions f, (x) and f; (x), i=1,..., m are 
twice continuously differentiable and, besides, functions f; are such 
that condition (9.12) is fulfilled, and set S = Sg (| So (So = {z: 
fo (2) < fo (20) }) is bounded with an arbitrary choice of point x, 
then on sequence (9.13) fy (p41) < fo (xp) and || mp’ (Zz) || —> O as 

-—> OO. 

Proof. The possibility of constructing sequence (9.13) follows 
from condition (9.12): with sufficiently small values of parameter a, 
point z,4, lies in the region about point z,, where function y (z) is 
defined. In this region, by the assumptions of the theorem, function 
p (2) = f (2, y (z)) is twice continuously differentiable. Taking this 
into account, the following estimate holds: 


@ (2n41) — P (Zn) arn || @" (za) 2 (—4 +E I] 9" (2) I) (9-45) 


where @” (z,) = gp” (zp + 0 (224, — 2,)), BELO, 1]; hence, if the value 
of a, is sufficiently small, the inequality (9.14) will be satisfied. 
This means that on elements of sequence (9.13) function f, (x) de- 
creases monotonically. 

We shall prove now that || qm’ (z,) || > 0. On the closed bounded 
set S continuous function | g, (x) |, assumes its minimum value y 
(Weierstrass’ theorem), and by (9.12) y > 0 (function | g, (z) |, 
is continuous being the maximum of continuous functions | g, (x) |,). 
The number of different functional determinants | g, | is finite and, 
since functions f; are differentiable on set S, they are all uniformly 
continuous. Therefore, for any constant 0 < y, <y, there is a con- 
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stant 9 > 0 such that at any point of set S which belongs to sphere 
Sg of radius 9 and having its centre at an arbitrary roint O0€ S, the 
absolute value of determinant | g, (x) | assuming at point 0 the 
value of | g, (9)| will be not less than y,. Moreover, since set S is 
bounded and the first and second partial derivatives of functions f; 
are continuous, these derivatives at any point 0€ S chosen in sphere 
Sg of radius op are bounded (by a constant M). Taking this statement 
into account and according to the theorems on implicit functions, 
we can assert that in a certain parallelepiped 


foe°+ 60°), i=1,...,n7 


belonging to sphere Sg, system (9.1) defines at least one twice 
continuously differentiable vector-function y (z) and in this parallel- 
epiped the derivatives of any such function are bounded: 


ly (io, lly’ (2) Il< M1. (9.16) 


Since the derivatives y’ (z), y” (z) are bounded and the first and 
second derivatives of function f, (x) on set S are bounded too, the 
derivatives of function @ (z) in parallelepiped [6' + 66°] are also 
bounded: || 9’ (2) || < Me, 9" (2) || < No. 

Taking into account the above remarks, we can ascertain that 
there is a constant 6 > 0 such that if a, < 5 then point x,4, is in 
sphere Sx, of radius p. Indeed, suppose . that | ney — Zp |? = 


= || Zn41 — Zn II? + Il Yati — Yr |? = p?. Since these equalities 
mean that z,4,€S,,, this point also belongs to parallelepiped [z, + 


+ 82x,],i = 1, ..., min which estimates (9.16) hold for derivatives 
of function y (z). Consequently, || Yat: — Yr Il < No |l 2n+1 — Zn Il 
and therefore it follows from the preceding equalities that 
(N3 + 1) Il Zaza — Zn Il? = @HNG || p’ (25) I? SS p?; hence 
Pp p 
% >We Ga Waa * 

This estimate shows that the equality || 2,4, — x, || =o holds 
with a, > o/(N,N,), i.e. that we can choose as 6 any constant not 
exceeding o/(V,N.,). 

Using now inequality (9.15) (and taking into account that deriva- 
tive @” (z) is bounded), it is easy to ascertain that inequality (9.14) 


will certainly hold with a, = min ¢ 6, a2) . But this means, 


2 
since f, (z) has a lower bound (on set S), that as k + oo of neces- 
sity || po’ (z,) || > 0. The theorem is proved. 

The condition || @’ (z,) || —-O means in the general case that 
sequence (9.13) (or a certain of its subsequences) converges to point z, 
which satisfies the necessary condition for an extremum of function 
fy (z) on manifold S, (at point z, the gradient f, (z,) is orthogonal 


200 


PROJECTION METHODS 


to the tangent hyperplane g’ (z,)(x — z,) = 0) (Chap. I, Sec. 4). 
Since function f, (z) is continuous, its minimum on set S exists. 
If sequence (9.13) converges to the solution and function 9 (2), 
to whose minimization in a certain region about the minimum 
the solving of the original problem is reduced, satisfies the condi- 
tions mg || v ||? < (g" (z) v, v) <M |l v |? for any ve€ E”™”, then 
the rate of convergence will not be slower than that of a certain 
geometric progression; this follows from the results on the conver- 
gence of gradient methods (theorem 1.2, Chap. IT). We shall dwell 
on several questions connected with the implementation of the 
method. 

In the theorem proved above, use was made in defining vectors 


Zz, and y, of the determinant | Zy (x,) |. To find this determinant 
it is necessary at each iteration to calculate all the determinants 
| gy |. However in practice, there is no need to do so (the determi- 


nant | g, | is used in the theorem only to simplify the proof). The 
convergence of the method is retained if we choose vectors z, and 
y, corresponding to any of the determinants | g, | whose absolute 
value at point z, is not less than an arbitrary small positive con- 
stant pw (the same for all *). With the conditions of the theorem such 
a constant exists since there is the constant y. Therefore in implement- 
ing the algorithm, vectors z and y should be chosen corresponding 
to the same determinant until at a certain point its absolute value 
becomes less than yu; only if this has occurred is it necessary to pass 
to other vectors z and y, i.e. to calculate another determinant |g, |. 
The constant p is arbitrary. It can occur that at a certain point x, 
all the determinants | g, | have an absolute value less than yp. We 
have then to choose a new constant uw, <p. At each reduction of 
parameter a which is necessary to fulfill the inequality (9.14), a new 
evaluation of function y (z) is required (for the evaluation of f, (x) = 
= fy (2, y (z))), i.e. we have to solve the system of nonlinear equa- 
tions (9.1) with a fixed value of vector z. To reduce the amount of 
computations, the required value of the parameter should be deter- 
mined by establishing the following inequality: 


fo (2, yr (2)) — fo (Zn, Yr) S —E@ || MP’ (2) |?, Oe <i. (9.17) 


As soon as this inequality is satisfied, the inequality (9.14) should 
be checked with the a obtained; if (9.14) is not satisfied, the reduc- 
tion of a should be continued; otherwise the obtained value of the 
parameter should be retained or one should attempt to increase 
it in checking (9.14). Note that 


y (2) = Yr ty’ (2x) (2 — 2%) + O (Il 2 — 2 IP) 
= y, (2) + O (|| z — 2, |I°). 


201 


CONSTRAINED FUNCTION MINIMIZATION 


With a sufficiently small 2,4, — z,, we obtain fy (Zn4,, Yr (24+1)) > 
—> fo (Zr+i, Y (2n41)); therefore if (9.17) is satisfied, inequality 
(9.14) will also be satisfied, i.e. no additional reductions of the 
step length will be required. 

Remark. The requirements of theorem 9.1 to the smoothness of 
functions fy (x) and /; (x) can be taken somewhat weaker; however, 
this leads to a more complicated proof. 

We dwell briefly also on a method of the (9.9) type in which vector 
P, is chosen by formula (9.11) (vectors z,, y, are determined in the 
same way as in the preceding method) and a, is the maximum value 
of the parameter (obtained by reductions) which satisfies the follow- 
ing inequality: 


ho (z, Y (z)) _ fo (Zp, Yn) < Ea (Q’ (Zp), Pr); i = Sp + ODr- 


For such an algorithm, theorem (9.1) holds true. The proof will 
differ only in some details (analogously to the difference between 
the proof of the theorem of the properties of methods of the gradient 
type and the proof of the theorems on the method of steepest descent, 
sec. 1, Chap. IT). 

Note that the amount of work per iteration in such an algorithm 
is greater than in method (9.13). 


Method of the Second Order 


Suppose that f, (z) is a strongly convex function. Then the qua- 
dratic function F, (x) (9.7) is strictly convex and since function 
y.(z) is linear, function , (z) = F, (z, y; (z)) is strictly convex 
too. More precisely, due to the strong convexity of f, (x), the follow- 
ing conditions are fulfilled for any function , (z) with any vector 
ver”: 


my |lv 2 < (piv, v) < My || v |P, m>0 (9.18) 
where matrix pr = for. + Y'*foyyyy + 2y'*fozy (all the deriv- 


atives are calculated at point z,). In) this case vector p, which 
minimizes %p, (z) is calculated by the formula 


Pr = —(Yx)~ br (22). (9.19) 


In the method of the second order, point 2,4,, kh = 0,1, ..., is con- 
structed as follows: 


Zht1 = 2h — Op (Pe) p’ (22), Ynta = Y (Zr41) (9.20) 


where vectors Zz, and y; are determined in the same way, as in const- 
ructing method (9.13), and for a, we take the maximum value of 
the parameter (obtained by reductions) which satisfies the follow- 
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ing inequality: 
’ 1 
fo (x) — fo (tn) < 8a (P' (Zn), Pe) D<eE <7 (9.21) 


where x = (z, y (2)), 2 = 2, + apy. 
Theorem 9.2. Let fy (x) be a twice continuously differentiable func- 
tion and for any vector w € E" 


m || ; |? < (fo (2) @,fo) < Ml @ |, m > 0 


and let functions f; (xz), i = 1, ..., m satisfy the requirements of 
theorem 9.1. Then, whatever the point x, chosen, the results of theorem 
9.1 hold for method (9.20). 

The proof of the theorem follows the same scheme as that used 
in proving theorem 9.1. Therefore, we shall dwell only on those 
changes in the proof, as compared to that of theorem 9.1, which 
arise because of different methods of choosing vector p,. 

Due to the strict convexity of f, (x), the set S, has a bound. It 
follows that set S = S, () Sy is bounded and closed (since sets 
S, and S, are closed). Taking this into account, we prove, in the 
same way as in theorem 9.1, that estimates (9.16) hold and establish 
that the derivatives g’ (z), m” (z) have bounds in the parallelepiped 
(6? + 66¢], i=1,..., n. 


Further, by (9.18), [| (wa) tl|/< — , consequently, || px|| = 
=|| (a)! Wa (zn) |] =| (be)! @’ (ze) || < Ne/mm, and, therefore, if 
(N24 4) |] znsa — zn |? = a3 N2|| pa ||2S>0%, then 


p pmo 
n= Nz || Pr Il TUE (9.22) 


It follows that any constant, provided it does not exceed 
om,/(N,N,), can be chosen as 6. 

Using the expansion of function g (z) into Taylor’s series, we find 
that in sphere S,, of radius p 


 (2n+1) — (En) =n (" (2n)s Pa) -+-SE (0 (Gre) Pas Pr) 


a Neollp . 
<p (M’ (Zz), Pr) ( {+ > oe m3) 


It follows from (9.19), with account of (9.18), that 


(‘biPae Pr) = —(P' (Ze), Pr) = Mp || De |I?. 
This implies that 


® (Zr41) — @ (Zz) Sap (Q’ (ZR), Dr) ( — 2) 


ing 
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Taking into account this estimate and inequalities (9.22), we estab- 
lish that inequality (9.21) will certainly hold with 


. 2 1— 
This means, since f, (x) has a lower bound, that 
(p’ (Zn), Pr) +9. (9.23) 


Since —(' (2x), Pr) = ((ba)"@" (Zn), OP (Zn)) & Mo IIH" (Zn) IP, it 
follows from (9. 23) that || mp’ (z,) || "0. This “completes the proof 
of the theorem. 

In implementing the algorithm (9.20) one should take account 
of the remarks concerning the choice of vectors z,, y, and parameter 
Q@, made in studying method (9.43). 

If sequence (9.20) converges to the solution and the condition 


Ph > Q" (Zp) (9.24) 


holds for function g (z) to whose minimization the solving of the 
original problem is reduced, then the rate of convergence of 
the method is superlinear. In order to ascertain this, one should 
take into account that if (9.24) holds (and also (1p; pz, pp) = 
= —( (Zz), Pr)), then 


P (Zr41) — P (Zn) = Ar (Q" (Zz), Pr) 
yy, (4 2p (” (Zp) Pry Pr) Ar ((P” (Zac) —Y” (2z)) Pr» Pt 


2 (tp; Pk: Ph) 2 (1p; Ph» Pr) 


, " ci ” ( ’ ) 
—> Gp (P (Zn), Dr) (1 ——} ——} ((” (Zp a) Pky Ph 


At the same time, by (9. 18) and (9.24), function @ (z) will be strongly 
convex in a certain region about the minimum. With the above 
remarks, the proof of the superlinear rate of convergence can be the 
same as that, for example, in studying Newton’s method (Sec. 2, 
Chap. II). 

Thus the rate of convergence of method (9.20) in a number of 
problems will be faster than that of methods of the first order. How- 
ever, the amount of work per iteration in method (9.20) may prove 
considerably greater owing to the necessary calculations of the 
second derivatives of function fy, (z). 


Minimization Methods 
of Higher Effectiveness 


The projection methods described in the preceding subsections 
are, in a sense, analogues of the gradient methods and Newton's 
method for solving problems of finding an absolute extremum. 
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They share the shortcomings of the corresponding methods: either 
they have a slow rate of convergence (like methods of the first order) 
or they involve a greater amount of work per iteration (like methods 
of the second order). However, the fact that in the algorithm under 
consideration the solving of the original problem is reduced to 
unconstrained minimization of a function (one or several functions, 
depending on whether condition (9.8) or (9.12) holds) makes it 
possible to use such effective minimization algorithms as methods 
of dual and conjugate directions (Secs. 3-9, Chap. II). Thus, if 
functions f; (x), i= 1, ..., m are such that condition (9.8) is 
satisfied and with any fixed z system (9.1) has a unique solution 
y = y (2), then, if @ (z) =f (2, y, (2)) is a twice continuously differ- 
entiable strongly convex function, any method of dual or conjugate 
directions (for the minimization of @ (z) ) converges to the solution 
at a superlinear rate. At the same time if use is made of the variants 
of methods with restoration of matrices A;' and A, after a finite 
number of steps (see the subsection on p. 104), then the convergence 
of methods of dual and conjugate directions will be guaranteed with 
the same assumptions about function @ (z) as in gradient methods. 

Consider, for example, the problem of the minimization of qua- 
dratic function f, (z) with linear constraints: g (xz) = Azx+b=0, 


where A = (a;;) is an m X n matrix, b = ar .., 5b"), 
Let the determinant | (a;;) | 4 0, i, /=1, ..., m; then we can 
take y = (a1, ..., x”), 2 = (2™ ey a) . 


Since y (z) is a linear function, @ (z) is a quadratic function of the 
variable z and it is strictly convex if the original function f, (z) 
is strictly convex. The application of any method of dual or conju- 
gate directions makes it possible to find the minimum of function 
p (z) after nm — m steps. 

If function fy (z) and f; (x), i = 1, ..., m satisfy the require- 
ments of theorem 9.2 (the condition (9.8) in this case is substituted 
by the weaker requirement (9.12)), then methods of dual or conjugate 
directions can be used to minimize each of the functions @ (z) that 
are to be dealt with in solving the problem. In other words, in algo- 
rithms of the (9.9) type, vector p,; and parameter a, can be deter- 
mined in the same way as in dual or conjugate directions methods, 
and vectors 2, and y; can be chosen as in studying method (9.43). 
Algorithms thus constructed (implemented with restoration of mat- 
rix A,’ or H;) converge under the same conditions as projection 
methods of the first and second orders. At the same time, their 
effectiveness is higher than that of the projection methods studied; 
in particular, with a small increase in the amount of work per iter- 
ation as compared to that in the method of the first order, a super- 
linear rate of convergence can be attained. 

In practice, just the algorithms of the type described in this subsection 
should be used for solving the \problem being studied. 


200 


CONSTRAINED FUNCTION MINIMIZATION 


Note also that the amount of work per iteration (method (9.20)) 
can be reduced if instead of matrix f}(z,) use is made of matrix 
D,, defined by the following system of equations: 


D y(Xp-i — Ze-i-r) = fo (Car) — fo (@r-i-1), 
i= Q, 1, ..., n— 1 


(analogue of system (3.6) of Chap. II which is used in constructing 
methods of dual direction) and vector p, = —Fx'q’ (z;) is construct- 
ed, where F, = Dy, + y'*Dnryyy’ + 2y'* + D,,, and matrices 
Drzzy Dryy, Drzy are parts of matrix D, and correspond to mat- 
rice fozz, foyys fozy, Tespectively. 


On the Solving of the General Problem 
of Mathematical Programming 


It is required to minimize function f, (z) with constraints 
fi (xt) <0, i=1,..., m. (9.25) 


Such constraints can be reduced to equality constraints in several 
ways. For instance, if we introduce additional variables z"*!, ... 
..., 2°t™, then constraints (9.25) will hold with the same values 
of variables z', ..., z” which satisfy the equalities 


(x™**)? + f, (x) =0, i=i1,..., m. (9.26) 


Consequently, the minimum of function f, (x) with constraints 
(9.25) will coincide with the minimum of f, (z) with constraints 
(9.26). For the minimization of fy (x) with constraints (9.26), methods 
of the first order described in the subsection on p. 249 can be used. 

Method (9.20) with conditions (9.26) cannot be applied to the 
minimization of f, (x), for in space £"*™ function f, (z) is not 
strictly convex; matrix fj (z), as is easily ascertained, is singular 
in E”*™. Due to this fact, there is no point in using methods of dual 
and conjugate directions in this case. 


Conclusive Remarks 


Of the class of projection methods with restoration of ties we have 
discussed only those algorithms in which use is made of formulas 
(9.9) to restore ties. In a number of problems this method of carrying 
out the concluding (third) stage of iteration may prove inconvenient; 
in this case it is worth while to carry out this stage in another way. 
For instance, one can determine point z,4,€S, by making the quantity 
| ce41 (@) — 2, (a) || minimize the distance between point z, (a) 


and set S,. 
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The amount of work per iteration in projection methods diminishes 
when approaching the solution (or a stationary point of function 
J) (z) on S,) since the problem of determining point z,4, (using point 


x,) becomes simpler. So, for example, the amount of work involved 
in solving system (9.1) with a fixed vector z,4, (i.e. in evaluating 
function y (2,4,) diminishes as we approach the solution of the problem 
because point y; (Z,4+,) with increasing * approximates the solution 
Y (2x41) better and better. In this sense, projection methods differ 
advantageously from penalty function methods where we have to 
solve problems of greater and greater complexity in order to obtain 
a more precise approximation to the solution. 
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quadratic programming which converge either after a finite number or an infinite 
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COMPUTATIONAL SCHEMES 
OF THE MAIN ALGORITHMS 


I. METHOD OF DUAL DIRECTIONS 
(CHAP. II, SEC. 3) 


This method is intended for the minimization of a couvex function f (z), 
xr€ En. 
Iteration scheme. 
Let zx, be an arbitrary point, £9.9 » So,-1) - » -» S9,-n+1 be an arbitrary linearly 
independent vector system. 
With 0 <k <n— 1, the iteration is as follows: 
(1) Construct the point 
Thi = Tp — Opf* (zp) (1) 
where a, is chosen by any of the methods described in Chap. II, Sec. 1. 
(2) Set: 
Tyty — Th+1 — The 
Cnt1 = f° (TR41) — I (zR)- (2) 
(3) Compute 
(Sk-R-nti) €n+1)- 
If 
I (Skak—n-tis Chti) | & V Il Spap-n41 I! Il enti ll (3) 


where y > 0 is an arbitrarily small constant, go to step (5). 
If 


| (Shag—ntis Chet) | << Y¥ USk.e—ntall Il ental, (4) 
go to step (4). 
(4) Set 
Thti = Br+iSk.k-nt (5) 
where the quantity B,4,, > 0 is chosen such that the condition |[r,+4, || < || rp Jl 
be fulfilled. 
Compute the gradient f/f’ (x, + r,4,) and then construct vector e,4, = 
=f" (tp + Troi) — I’ (zp). Then go to step (9). 
(3) Construct the vector system 
Sk, k-n+1 
(Sk jk—-nai> Chi) ” 
Shi. k-j =Sk,h-j—(Sk, R—j> Ch44) Sk44y R+dy 
j=0, 1, ..., n—2. (6) 


Ski, k+1— 
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This is the end of the iteration. 
With k>n: 
(1) Construct vector 
n-i 
Pr=- » (f’ (XR), Sk, h-~i) rRr-i- 
i=0 


(2) Compute (f’ (zp), Pp): 

If (f' (zz), Pr) S 0, construct point z,,, by one of the formulas z,4, = 
= xp + appz, Where a, is chosen according to condition (2.2), Chap. IT. 

If (f’ (7%), Pr) = 0, construct point 7,4, using the gradient method (see (1)) 
of the iteration with & < n — 1 (further, construct the iteration in the same way 
as you did with k <n — 1 (steps (2)-(5)). 

Remarks. 1. We have adduced only one of the possible computation schemes 
of the methods of dual directions. Here the first iterations (kK <n — 1) are 
carried out by the gradient method. Since at the initial steps of the iterative 
process the gradient method usually provides for a sufficiently steep decrease of 
the function, such an initial stage of the process is expedient in solving many 
problems. 

2. We have changed here for the sake of convenience the notations of the 
vectors of the dual basis (cf. (6), and (3.21) of Chap. II). 

3. If vector r;4, is chosen in form (5)and function f (x) is smooth and strongly 
convex (i.e. conditions (2.4) of Chap. II hold true), then inequality (3) will auto- 
matically hold, provided the constant y has been chosen sufficiently small. 
Indeed, if function f (z) satisfies the requirements formulated, then || e;|| < 
< M lr, || and estimate (5.18) of Chap. II holds true. By virtue of this fact, 
we have 


1 m 
Skak—-n+is © = ——I(r Chas) = ——Il r 2 
( k RkR-n+i k+4) Br ( heis h+4) — Ba | k+i | 
m m 
= BrAl | The tl Hl Ck+1 |= Al | Sk, RkR—-n+i | | Ck+i | ° 


Thus ify < + , 

4. The practice of computations shows that quantity y may be chosen very 
small: y = 10-6-10-!5. If condition (3)} is not satisfied even with vector 
r,4, having been chosen in form (5), it means that matrix /” (x) as z — z, be- 
comes ill conditioned, i.e. the minimized function is not strongly convex. 
In particular, the surfaces of the levels of this function may have forms of a long, 
deep and narrow valley. In this case, a very accurate approximation to the solu- 
tion with respect to the variable is not attainable. However, the computation 
practice shows that one can obtain function values sufficiently close to the mini- 
Aum in minimizing even of nonconvex functions with deep valley level sur- 

aces. 


II. CONJUGATE GRADIENTS METHOD 
(CHAP. II, SEC. 4) 


This method is intended for the minimization of convex function f (x), x €E”. 
Iteration scheme. 


then inequality (3) is satisfied. 


Let x) be an arbitrary point, pp = — f’ (r9). 
O<k<n—tI, _ go to (2). 
(1) Uf lho. go to (9d). 
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(2) Construct point 
Cpt = Ip OnPR 
where factor a, is determined under the condition 


} (Zp —L CL Pr) = min i (xp, 1. OLDp)- 
“20 


(3) Compute vector 


Proi = — f' (fet) + BrtiPrts 
where 
Brei= — F (tres), F (tra —F (en) 
(f° (Zn), Pk) 


(4) Go to (1). 

(5) Set xz, = 2%, Pp = — f’ (zp) and repeat the process (go to (1)). 

Remark. ‘Coefficient 6.4, can be determined by any of the formulas (4.73), 
Chap. II. 


II]. METHOD OF FEASIBLE DIRECTIONS 
(CHAP. III, SEC. 2) 


This method is intended for solving problems of convex programming: to 
minimize function f, (x) with constraints 


f(z) <0, i=1,..., m, 
Ax —b=0 


where x € E®, f; (x), i = 0, ..., m are convex continuously differentiable func- 
tions, 4 is an 7 xX n matrix, b is an I-dimensional vector. 
Notations: 


Jax) = (i: f; (x7) > — 6, i= 1, ..., mp, 


Ip |] = max | ps 
1<jysn 
where p ¢ E”, p} are components of vector p. 
Initial data: Tq Is the initial approximation satisfying all the constraints; 
6) > 0, § >0,i = 0,..., m are positive numbers which, speaking generally, 
are arbitrary. 
The common step of the algorithm. 
Point x, and number 6, > 0 have been computed. 
(1) Solve the problem A linear programming 
min 7 


(7's (ns P) <Eimy 4 E TG, (em) U 0}, 
Ap = 0, 


—1<pi<+i1, j=1, ..., 7. 


The solution is 1p, Dp. 
(2) If nm, < — 6x, then 


Tpty = Lp + OpPps On+y = Op 
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where a, = = , and gy is the first integer of gq=0, 1..., for which the following 


inequalities hold: 


1 1 4 

fo ( =n-+ <> Pr) < fo (A) > Se Nhs 
| 

fi (s+ Pr) <0, i=1, ...,m. 


(3) If Mp = 5, then Upiy = Lp, On+y = = bp: 


(4) Return to (f). 

Remark. The choice of numbers 6,, &; can influence the course of the process; 
the choice should be made on the basis of an analysis of the problem under con- 
sideration. The algorithm can be used also for nonconvex problems. 


IV. LINEARIZATION METHOD 
(CHAP. III, SEC. 5) 


This method is intended for solving the problem: to minimize fy (x) with 
constraints: 
fi(c)<0, i=1,...,m, 


fj (2) =0, i—m+i4,...,m-+l 


where f, (x) are continuously differentiable functions. 
Notations: 


F (x) = max {0, fi (x), -2+3 fm (x), | fm+1 (x) , cy | fm+i (x) }s 
Js (x)= {i: fx (zt) SF (x) — 6, ix=1,..., m}, 
SS (x) = {is | fi (we) | & F(z) —6, t=m+1, ..., m+]}, 
Dy (x) = fo (t)+- NF (z), 
Te 
\| PlF= >) (p42. 
j=1 
Initial data: the initial approximation z, is arbitrary; Ng is sufficiently 
great, 6, > 0, O<e <1. 
The common step of the algorithm. 
Point z; is constructed and numbers NV, and 6, are chosen. 
(1) Solve the problem: 
ee 1 
min (f5 (tx)» P)-+=> IP Il’, 
(f; (te), P)+fi (tr) SO, 7€ Js, (Zp), 
(f; (Cp, P)+hi (rp) =0Q, LE Jb, (<p) 


The solution is p,. If the problem is incompatible, then set 2,4; = zp, 


b,41 = 5 on Np = N, and return to (1). 
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(2) If the problem is consistent and p, is found, then set 
Tr+1 = Th + APRs 
Sp+1 = Sp 
, 1 , 
where «, is chosen equal to 3a and gp is the first of integers of g = 0, 1,..., 


for which the relation 


1 1 ; 
On, ( Ip +5y Pr) < Oy, (nh) —5q Ell P, IP 
holds. 
(3) Let numbers ui , le SG, (zp) US 3, (z,) be Lagrange multipliers of the 
t 
subsidiary problem that was solved at the first stage. If now 
Np > ») ui + ») | ut [, 
i€7%- (x,) TES (xp) 
76, R 1p R 
then Vrii=NVR. 
Otherwise 
Nryy=2 ( » uy + ») ual) 
75, ae a 
(4) Return to (1). 
Remark. Numbers 6, and NN; cease to change from a certain step on. The al- 


gorithm requires an effectively working standard program for solving the 
problem of quadratic programming. 


V. ALGORITHM FOR SOLVING A SYSTEM 
OF EQUATIONS WITHOUT CALCULATING 
DERIVATIVES 
(CHAP. III, SEC. 6) 


This algorithm is intended for solving the system of equations 
p (xz) = 0 
where z € E”, p (z) is an n-dimensional vector-function whose components 
pir), j = 1, ..., n are differentiable. 
Initial data: initial approximations 2,, ..., z, are arbitrarily chosen in 


a sufficiently small region about the solution. In a particular case all 2, k = 
= 1,..., n can coincide. 


TN 
Notations: || p (x)||? = > (p) (x))?; m (k) is equal to 1, 2,...,n—1if k& 
j=1 
when divided by n leaves a remainder 41, 2,..., n—1 respectively, q(k) =n 
if it divides by n. 


The common step of the algorithm, z,, ..., 7; have been constructed. 
(1) Solve for unknowns f;,i = 1,..., » the system of equations 


nm 
>} Sk-n+i Bi= —P (zr) 
i= | 
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where 
1 


*i=Tpepll [p (xj |l P (zs) leg) —P (a), 


e, is a vector with zero components, except the i-th one which is equal to 1. 
(2) Set 


7 
Tha, = Xa » Bj ep (h—n+i)- 
i=1 


(3) Return to (1). 
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